diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/_yaml/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/_yaml/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..90458e9248fc8445e677b1a100fcaeeb24d26fd9
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/_yaml/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/cuda_bindings-12.9.4.dist-info/licenses/LICENSE b/URSA/.venv_ursa/lib/python3.12/site-packages/cuda_bindings-12.9.4.dist-info/licenses/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..b7d042fcee35d11ffb37ffa7b8fb7d2bee3f7999
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/cuda_bindings-12.9.4.dist-info/licenses/LICENSE
@@ -0,0 +1,48 @@
+NVIDIA SOFTWARE LICENSE
+
+This license is a legal agreement between you and NVIDIA Corporation ("NVIDIA") and governs your use of the NVIDIA CUDA Python software and materials provided hereunder ("SOFTWARE").
+
+This license can be accepted only by an adult of legal age of majority in the country in which the SOFTWARE is used. If you are under the legal age of majority, you must ask your parent or legal guardian to consent to this license. By taking delivery of the SOFTWARE, you affirm that you have reached the legal age of majority, you accept the terms of this license, and you take legal and financial responsibility for the actions of your permitted users. 
+
+You agree to use the SOFTWARE only for purposes that are permitted by (a) this license, and (b) any applicable law, regulation or generally accepted practices or guidelines in the relevant jurisdictions.
+
+1. LICENSE. Subject to the terms of this license, NVIDIA grants you a non-exclusive limited license to: (a) install and use the SOFTWARE, and (b) distribute the SOFTWARE subject to the distribution requirements described in this license. NVIDIA reserves all rights, title and interest in and to the SOFTWARE not expressly granted to you under this license.
+
+2. DISTRIBUTION REQUIREMENTS. These are the distribution requirements for you to exercise the distribution grant: 
+a.  The terms under which you distribute the SOFTWARE must be consistent with the terms of this license, including (without limitation) terms relating to the license grant and license restrictions and protection of NVIDIA's intellectual property rights. 
+b.  You agree to notify NVIDIA in writing of any known or suspected distribution or use of the SOFTWARE not in compliance with the requirements of this license, and to enforce the terms of your agreements with respect to distributed SOFTWARE. 
+
+3. LIMITATIONS. Your license to use the SOFTWARE is restricted as follows:
+a.  The SOFTWARE is licensed for you to develop applications only for use in systems with NVIDIA GPUs.
+b.  You may not reverse engineer, decompile or disassemble, or remove copyright or other proprietary notices from any portion of the SOFTWARE or copies of the SOFTWARE. 
+c.  You may not modify or create derivative works of any portion of the SOFTWARE. 
+d.  You may not bypass, disable, or circumvent any technical measure, encryption, security, digital rights management or authentication mechanism in the SOFTWARE.
+e.  You may not use the SOFTWARE in any manner that would cause it to become subject to an open source software license. As examples, licenses that require as a condition of use, modification, and/or distribution that the SOFTWARE be (i) disclosed or distributed in source code form; (ii) licensed for the purpose of making derivative works; or (iii) redistributable at no charge.
+f.  Unless you have an agreement with NVIDIA for this purpose, you may not use the SOFTWARE with any system or application where the use or failure of the system or application can reasonably be expected to threaten or result in personal injury, death, or catastrophic loss. Examples include use in avionics, navigation, military, medical, life support or other life critical applications. NVIDIA does not design, test or manufacture the SOFTWARE for these critical uses and NVIDIA shall not be liable to you or any third party, in whole or in part, for any claims or damages arising from such uses. 
+g.  You agree to defend, indemnify and hold harmless NVIDIA and its affiliates, and their respective employees, contractors, agents, officers and directors, from and against any and all claims, damages, obligations, losses, liabilities, costs or debt, fines, restitutions and expenses (including but not limited to attorney's fees and costs incident to establishing the right of indemnification) arising out of or related to use of the SOFTWARE outside of the scope of this Agreement, or not in compliance with its terms. 
+
+4. PRE-RELEASE. SOFTWARE versions identified as alpha, beta, preview, early access or otherwise as pre-release may not be fully functional, may contain errors or design flaws, and may have reduced or different security, privacy, availability, and reliability standards relative to commercial versions of NVIDIA software and materials. You may use a pre-release SOFTWARE version at your own risk, understanding that these versions are not intended for use in production or business-critical systems. 
+
+5. OWNERSHIP. The SOFTWARE and the related intellectual property rights therein are and will remain the sole and exclusive property of NVIDIA or its licensors. The SOFTWARE is copyrighted and protected by the laws of the United States and other countries, and international treaty provisions. NVIDIA may make changes to the SOFTWARE, at any time without notice, but is not obligated to support or update the SOFTWARE.
+ 
+6. COMPONENTS UNDER OTHER LICENSES. The SOFTWARE may include NVIDIA or third-party components with separate legal notices or terms as may be described in proprietary notices accompanying the SOFTWARE. If and to the extent there is a conflict between the terms in this license and the license terms associated with a component, the license terms associated with the components control only to the extent necessary to resolve the conflict.  
+
+7. FEEDBACK. You may, but don't have to, provide to NVIDIA any Feedback. "Feedback" means any suggestions, bug fixes, enhancements, modifications, feature requests or other feedback regarding the SOFTWARE. For any Feedback that you voluntarily provide, you hereby grant NVIDIA and its affiliates a perpetual, non-exclusive, worldwide, irrevocable license to use, reproduce, modify, license, sublicense (through multiple tiers of sublicensees), and distribute (through multiple tiers of distributors) the Feedback without the payment of any royalties or fees to you. NVIDIA will use Feedback at its choice.
+
+8. NO WARRANTIES. THE SOFTWARE IS PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING, BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR FITNESS FOR A PARTICULAR PURPOSE. NVIDIA DOES NOT WARRANT THAT THE SOFTWARE WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION THEREOF WILL BE UNINTERRUPTED OR ERROR-FREE, OR THAT ALL ERRORS WILL BE CORRECTED. 
+
+9. LIMITATIONS OF LIABILITY. TO THE MAXIMUM EXTENT PERMITTED BY LAW, NVIDIA AND ITS AFFILIATES SHALL NOT BE LIABLE FOR ANY SPECIAL, INCIDENTAL, PUNITIVE OR CONSEQUENTIAL DAMAGES, OR ANY LOST PROFITS, PROJECT DELAYS, LOSS OF USE, LOSS OF DATA OR LOSS OF GOODWILL, OR THE COSTS OF PROCURING SUBSTITUTE PRODUCTS, ARISING OUT OF OR IN CONNECTION WITH THIS LICENSE OR THE USE OR PERFORMANCE OF THE SOFTWARE, WHETHER SUCH LIABILITY ARISES FROM ANY CLAIM BASED UPON BREACH OF CONTRACT, BREACH OF WARRANTY, TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY OR ANY OTHER CAUSE OF ACTION OR THEORY OF LIABILITY, EVEN IF NVIDIA HAS PREVIOUSLY BEEN ADVISED OF, OR COULD REASONABLY HAVE FORESEEN, THE POSSIBILITY OF SUCH DAMAGES. IN NO EVENT WILL NVIDIA'S AND ITS AFFILIATES TOTAL CUMULATIVE LIABILITY UNDER OR ARISING OUT OF THIS LICENSE EXCEED US$10.00. THE NATURE OF THE LIABILITY OR THE NUMBER OF CLAIMS OR SUITS SHALL NOT ENLARGE OR EXTEND THIS LIMIT. 
+
+10. TERMINATION. Your rights under this license will terminate automatically without notice from NVIDIA if you fail to comply with any term and condition of this license or if you commence or participate in any legal proceeding against NVIDIA with respect to the SOFTWARE. NVIDIA may terminate this license with advance written notice to you if NVIDIA decides to no longer provide the SOFTWARE in a country or, in NVIDIA's sole discretion, the continued use of it is no longer commercially viable. Upon any termination of this license, you agree to promptly discontinue use of the SOFTWARE and destroy all copies in your possession or control. Your prior distributions in accordance with this license are not affected by the termination of this license. All provisions of this license will survive termination, except for the license granted to you.  
+
+11. APPLICABLE LAW. This license will be governed in all respects by the laws of the United States and of the State of Delaware as those laws are applied to contracts entered into and performed entirely within Delaware by Delaware residents, without regard to the conflicts of laws principles. The United Nations Convention on Contracts for the International Sale of Goods is specifically disclaimed. You agree to all terms of this Agreement in the English language. The state or federal courts residing in Santa Clara County, California shall have exclusive jurisdiction over any dispute or claim arising out of this license. Notwithstanding this, you agree that NVIDIA shall still be allowed to apply for injunctive remedies or an equivalent type of urgent legal relief in any jurisdiction. 
+
+12. NO ASSIGNMENT. This license and your rights and obligations thereunder may not be assigned by you by any means or operation of law without NVIDIA's permission. Any attempted assignment not approved by NVIDIA in writing shall be void and of no effect. 
+ 
+13. EXPORT. The SOFTWARE is subject to United States export laws and regulations. You agree that you will not ship, transfer or export the SOFTWARE into any country, or use the SOFTWARE in any manner, prohibited by the United States Bureau of Industry and Security or economic sanctions regulations administered by the U.S. Department of Treasury's Office of Foreign Assets Control (OFAC), or any applicable export laws, restrictions or regulations. These laws include restrictions on destinations, end users and end use. By accepting this license, you confirm that you are not a resident or citizen of any country currently embargoed by the U.S. and that you are not otherwise prohibited from receiving the SOFTWARE.
+
+14. GOVERNMENT USE. The SOFTWARE has been developed entirely at private expense and is "commercial items" consisting of "commercial computer software" and "commercial computer software documentation" provided with RESTRICTED RIGHTS. Use, duplication or disclosure by the U.S. Government or a U.S. Government subcontractor is subject to the restrictions in this license pursuant to DFARS 227.7202-3(a) or as set forth in subparagraphs (b)(1) and (2) of the Commercial Computer Software - Restricted Rights clause at FAR 52.227-19, as applicable. Contractor/manufacturer is NVIDIA, 2788 San Tomas Expressway, Santa Clara, CA 95051. 
+
+15. ENTIRE AGREEMENT. This license is the final, complete and exclusive agreement between the parties relating to the subject matter of this license and supersedes all prior or contemporaneous understandings and agreements relating to this subject matter, whether oral or written. If any court of competent jurisdiction determines that any provision of this license is illegal, invalid or unenforceable, the remaining provisions will remain in full force and effect. This license may only be modified in a writing signed by an authorized representative of each party.
+
+(v. May 12, 2021)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..2cb1ce9ed67f1e7525b2def875eae915a34b0051
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/_unix.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/_unix.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c0a5bde3314193ebb0d4a06781d72597fabb0028
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/_unix.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/_util.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/_util.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e2cd6cd2d7fbe5911a02071bd5be24e889cb7cd2
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/_util.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/version.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/version.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..7190dbf29425e82ab837ad9c2f597307a9eed8a3
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/filelock/__pycache__/version.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/httpx-0.28.1.dist-info/licenses/LICENSE.md b/URSA/.venv_ursa/lib/python3.12/site-packages/httpx-0.28.1.dist-info/licenses/LICENSE.md
new file mode 100644
index 0000000000000000000000000000000000000000..ab79d16a3f4c6c894c028d1f7431811e8711b42b
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/httpx-0.28.1.dist-info/licenses/LICENSE.md
@@ -0,0 +1,12 @@
+Copyright © 2019, [Encode OSS Ltd](https://www.encode.io/).
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+* Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..0264634a340adc7117d0c623b2349f8f2a67a9fa
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_commit_api.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_commit_api.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..41fcfb8ebf505474b12485af4f6f38abb567fd91
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_commit_api.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_commit_scheduler.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_commit_scheduler.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..9920c83755849f66143cc362e8e7e2c91199c0d4
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_commit_scheduler.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_inference_endpoints.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_inference_endpoints.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..cf25a16450f51ef4a5d35ad0f8f318feedb5c97f
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_inference_endpoints.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_jobs_api.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_jobs_api.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..248ff1f4dfcd16ceec5c288a9214e3623e027ec9
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_jobs_api.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_local_folder.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_local_folder.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6602ee1d4cc23fa64b46725ef702c01fd3d2bc88
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_local_folder.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_login.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_login.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b5bde19ce444c819ab36faafe0e50db00c3dcd00
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_login.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_oauth.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_oauth.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e19fb3bbbf71950ca4ee94b7a537f5389ec4c15a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_oauth.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_snapshot_download.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_snapshot_download.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..166fb570bb2d1c066a946525a4ceaa8f5d471f5b
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_snapshot_download.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_space_api.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_space_api.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..46d31089c66f55edea2e8fcf35d5af71ccbd00d3
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_space_api.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_tensorboard_logger.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_tensorboard_logger.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..97eb44dc49cee6ce51c95f6f0a3f0f839341cfed
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_tensorboard_logger.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_upload_large_folder.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_upload_large_folder.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6956bb8714e14f5731682a16fc63c9dc90609c00
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_upload_large_folder.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_webhooks_payload.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_webhooks_payload.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3a4bc1ea5dcdbd605082968e9d760b8c7cae31d6
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_webhooks_payload.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_webhooks_server.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_webhooks_server.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a94d067ccedbb7c9ddba4b4416edfdc79e7d6ccb
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/_webhooks_server.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/community.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/community.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f6de4319eaa0baf94ac1cf030918943318733f6f
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/community.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/constants.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/constants.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..0bcb4a3ce1e8d176d67757b6a0793efe819729c9
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/constants.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/dataclasses.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/dataclasses.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..19d06131c7f4f72ac268aa4935dfa2be832b5af8
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/dataclasses.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/errors.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/errors.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..eeb1de581f63e3d25837dadff8eae42198ea3be4
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/errors.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/fastai_utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/fastai_utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..65a9222270fcff6b16c74bd717cee7f49c6d1039
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/fastai_utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/file_download.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/file_download.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..eb949a691fad771421371926c2357b3671b7e28a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/file_download.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/hf_file_system.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/hf_file_system.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ebb52341edcd5254043a0cf8543dc776531e8559
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/hf_file_system.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/hub_mixin.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/hub_mixin.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..4634a041c4784bc51cf42b3a2f411fbe7796aa00
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/hub_mixin.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/inference_api.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/inference_api.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c0b5f528d820211a84898d81d2f00cc46b639fd1
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/inference_api.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/keras_mixin.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/keras_mixin.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..61dd16ad2e06614449eda7d5693d91ad548673c3
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/keras_mixin.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/lfs.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/lfs.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..1bf2893eaab3fb4adc43fe34b86bde2f8d0e6f22
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/lfs.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/repocard.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/repocard.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f8a93e97405a6a9b39be67b9f00b1b547feeda37
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/repocard.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/repocard_data.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/repocard_data.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..2467999ea2598f47a08dbd464d22c47341e19571
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/repocard_data.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/repository.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/repository.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a47552cafc5f94671884aff082b57263aae5433b
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/__pycache__/repository.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..7a1a8d793b89e16e5fa46ec5d420ec96fe1d72fe
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__init__.py
@@ -0,0 +1,27 @@
+# Copyright 2025 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from abc import ABC, abstractmethod
+from argparse import _SubParsersAction
+
+
+class BaseHuggingfaceCLICommand(ABC):
+    @staticmethod
+    @abstractmethod
+    def register_subcommand(parser: _SubParsersAction):
+        raise NotImplementedError()
+
+    @abstractmethod
+    def run(self):
+        raise NotImplementedError()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..27d93513f017149229d2cb287ceaf634c05c8029
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/_cli_utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/_cli_utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..14780a94129e6a887434d043f8107ffd833ef876
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/_cli_utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/auth.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/auth.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..07560831be7919cf816e570250e2cfc7cc11e19d
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/auth.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/cache.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/cache.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5053d3cc70d71213088ffcca38aa8aacf8de8b2e
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/cache.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/download.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/download.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6fd32a332f6c4c69c6e2b53468a071689c60900a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/download.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/hf.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/hf.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c192c3068068a071d174769584eccb8d49a892b2
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/hf.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/jobs.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/jobs.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ca83b50da4b80771b893b6f7118f8e004ddbc187
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/jobs.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/lfs.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/lfs.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ef0e377e717e3e78b357895f6ce6b45e93d9e680
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/lfs.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/repo.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/repo.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..64f923146c77b3cc72cd622c2f5dbf8d611d26b1
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/repo.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/repo_files.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/repo_files.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..bf53639901b89cb2db5b79bc775b030956790128
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/repo_files.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/system.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/system.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..1767833d9bbd0a3b13c85c710b6c1fe3736c5980
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/system.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/upload.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/upload.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..387d93c1689292f2a3bc379aab8a483db9ebd771
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/upload.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/upload_large_folder.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/upload_large_folder.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..9eb320755455e8220532137c56e45843732ee7bd
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/__pycache__/upload_large_folder.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/_cli_utils.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/_cli_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..bd56ad6896db2a257323e022896940c0ba0d68d3
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/_cli_utils.py
@@ -0,0 +1,69 @@
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains a utility for good-looking prints."""
+
+import os
+from typing import List, Union
+
+
+class ANSI:
+    """
+    Helper for en.wikipedia.org/wiki/ANSI_escape_code
+    """
+
+    _bold = "\u001b[1m"
+    _gray = "\u001b[90m"
+    _red = "\u001b[31m"
+    _reset = "\u001b[0m"
+    _yellow = "\u001b[33m"
+
+    @classmethod
+    def bold(cls, s: str) -> str:
+        return cls._format(s, cls._bold)
+
+    @classmethod
+    def gray(cls, s: str) -> str:
+        return cls._format(s, cls._gray)
+
+    @classmethod
+    def red(cls, s: str) -> str:
+        return cls._format(s, cls._bold + cls._red)
+
+    @classmethod
+    def yellow(cls, s: str) -> str:
+        return cls._format(s, cls._yellow)
+
+    @classmethod
+    def _format(cls, s: str, code: str) -> str:
+        if os.environ.get("NO_COLOR"):
+            # See https://no-color.org/
+            return s
+        return f"{code}{s}{cls._reset}"
+
+
+def tabulate(rows: List[List[Union[str, int]]], headers: List[str]) -> str:
+    """
+    Inspired by:
+
+    - stackoverflow.com/a/8356620/593036
+    - stackoverflow.com/questions/9535954/printing-lists-as-tabular-data
+    """
+    col_widths = [max(len(str(x)) for x in col) for col in zip(*rows, headers)]
+    row_format = ("{{:{}}} " * len(headers)).format(*col_widths)
+    lines = []
+    lines.append(row_format.format(*headers))
+    lines.append(row_format.format(*["-" * w for w in col_widths]))
+    for row in rows:
+        lines.append(row_format.format(*row))
+    return "\n".join(lines)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/auth.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/auth.py
new file mode 100644
index 0000000000000000000000000000000000000000..bbf475a4f8785152b992b116a69b4b16293688f3
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/auth.py
@@ -0,0 +1,213 @@
+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains commands to authenticate to the Hugging Face Hub and interact with your repositories.
+
+Usage:
+    # login and save token locally.
+    hf auth login --token=hf_*** --add-to-git-credential
+
+    # switch between tokens
+    hf auth switch
+
+    # list all tokens
+    hf auth list
+
+    # logout from all tokens
+    hf auth logout
+
+    # check which account you are logged in as
+    hf auth whoami
+"""
+
+from argparse import _SubParsersAction
+from typing import List, Optional
+
+from requests.exceptions import HTTPError
+
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.constants import ENDPOINT
+from huggingface_hub.hf_api import HfApi
+
+from .._login import auth_list, auth_switch, login, logout
+from ..utils import get_stored_tokens, get_token, logging
+from ._cli_utils import ANSI
+
+
+logger = logging.get_logger(__name__)
+
+try:
+    from InquirerPy import inquirer
+    from InquirerPy.base.control import Choice
+
+    _inquirer_py_available = True
+except ImportError:
+    _inquirer_py_available = False
+
+
+class AuthCommands(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        # Create the main 'auth' command
+        auth_parser = parser.add_parser("auth", help="Manage authentication (login, logout, etc.).")
+        auth_subparsers = auth_parser.add_subparsers(help="Authentication subcommands")
+
+        # Show help if no subcommand is provided
+        auth_parser.set_defaults(func=lambda args: auth_parser.print_help())
+
+        # Add 'login' as a subcommand of 'auth'
+        login_parser = auth_subparsers.add_parser(
+            "login", help="Log in using a token from huggingface.co/settings/tokens"
+        )
+        login_parser.add_argument(
+            "--token",
+            type=str,
+            help="Token generated from https://huggingface.co/settings/tokens",
+        )
+        login_parser.add_argument(
+            "--add-to-git-credential",
+            action="store_true",
+            help="Optional: Save token to git credential helper.",
+        )
+        login_parser.set_defaults(func=lambda args: AuthLogin(args))
+
+        # Add 'logout' as a subcommand of 'auth'
+        logout_parser = auth_subparsers.add_parser("logout", help="Log out")
+        logout_parser.add_argument(
+            "--token-name",
+            type=str,
+            help="Optional: Name of the access token to log out from.",
+        )
+        logout_parser.set_defaults(func=lambda args: AuthLogout(args))
+
+        # Add 'whoami' as a subcommand of 'auth'
+        whoami_parser = auth_subparsers.add_parser(
+            "whoami", help="Find out which huggingface.co account you are logged in as."
+        )
+        whoami_parser.set_defaults(func=lambda args: AuthWhoami(args))
+
+        # Existing subcommands
+        auth_switch_parser = auth_subparsers.add_parser("switch", help="Switch between access tokens")
+        auth_switch_parser.add_argument(
+            "--token-name",
+            type=str,
+            help="Optional: Name of the access token to switch to.",
+        )
+        auth_switch_parser.add_argument(
+            "--add-to-git-credential",
+            action="store_true",
+            help="Optional: Save token to git credential helper.",
+        )
+        auth_switch_parser.set_defaults(func=lambda args: AuthSwitch(args))
+
+        auth_list_parser = auth_subparsers.add_parser("list", help="List all stored access tokens")
+        auth_list_parser.set_defaults(func=lambda args: AuthList(args))
+
+
+class BaseAuthCommand:
+    def __init__(self, args):
+        self.args = args
+        self._api = HfApi()
+
+
+class AuthLogin(BaseAuthCommand):
+    def run(self):
+        logging.set_verbosity_info()
+        login(
+            token=self.args.token,
+            add_to_git_credential=self.args.add_to_git_credential,
+        )
+
+
+class AuthLogout(BaseAuthCommand):
+    def run(self):
+        logging.set_verbosity_info()
+        logout(token_name=self.args.token_name)
+
+
+class AuthSwitch(BaseAuthCommand):
+    def run(self):
+        logging.set_verbosity_info()
+        token_name = self.args.token_name
+        if token_name is None:
+            token_name = self._select_token_name()
+
+        if token_name is None:
+            print("No token name provided. Aborting.")
+            exit()
+        auth_switch(token_name, add_to_git_credential=self.args.add_to_git_credential)
+
+    def _select_token_name(self) -> Optional[str]:
+        token_names = list(get_stored_tokens().keys())
+
+        if not token_names:
+            logger.error("No stored tokens found. Please login first.")
+            return None
+
+        if _inquirer_py_available:
+            return self._select_token_name_tui(token_names)
+        # if inquirer is not available, use a simpler terminal UI
+        print("Available stored tokens:")
+        for i, token_name in enumerate(token_names, 1):
+            print(f"{i}. {token_name}")
+        while True:
+            try:
+                choice = input("Enter the number of the token to switch to (or 'q' to quit): ")
+                if choice.lower() == "q":
+                    return None
+                index = int(choice) - 1
+                if 0 <= index < len(token_names):
+                    return token_names[index]
+                else:
+                    print("Invalid selection. Please try again.")
+            except ValueError:
+                print("Invalid input. Please enter a number or 'q' to quit.")
+
+    def _select_token_name_tui(self, token_names: List[str]) -> Optional[str]:
+        choices = [Choice(token_name, name=token_name) for token_name in token_names]
+        try:
+            return inquirer.select(
+                message="Select a token to switch to:",
+                choices=choices,
+                default=None,
+            ).execute()
+        except KeyboardInterrupt:
+            logger.info("Token selection cancelled.")
+            return None
+
+
+class AuthList(BaseAuthCommand):
+    def run(self):
+        logging.set_verbosity_info()
+        auth_list()
+
+
+class AuthWhoami(BaseAuthCommand):
+    def run(self):
+        token = get_token()
+        if token is None:
+            print("Not logged in")
+            exit()
+        try:
+            info = self._api.whoami(token)
+            print(ANSI.bold("user: "), info["name"])
+            orgs = [org["name"] for org in info["orgs"]]
+            if orgs:
+                print(ANSI.bold("orgs: "), ",".join(orgs))
+
+            if ENDPOINT != "https://huggingface.co":
+                print(f"Authenticated through private endpoint: {ENDPOINT}")
+        except HTTPError as e:
+            print(e)
+            print(ANSI.red(e.response.text))
+            exit(1)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/cache.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/cache.py
new file mode 100644
index 0000000000000000000000000000000000000000..cc36ef5efd2508bcc5e32b1fbe222bb55358777c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/cache.py
@@ -0,0 +1,403 @@
+# coding=utf-8
+# Copyright 2025-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains the 'hf cache' command group with 'scan' and 'delete' subcommands."""
+
+import os
+import time
+from argparse import Namespace, _SubParsersAction
+from functools import wraps
+from tempfile import mkstemp
+from typing import Any, Callable, Iterable, List, Literal, Optional, Union
+
+from ..utils import CachedRepoInfo, CachedRevisionInfo, CacheNotFound, HFCacheInfo, scan_cache_dir
+from . import BaseHuggingfaceCLICommand
+from ._cli_utils import ANSI, tabulate
+
+
+# --- DELETE helpers (from delete_cache.py) ---
+try:
+    from InquirerPy import inquirer
+    from InquirerPy.base.control import Choice
+    from InquirerPy.separator import Separator
+
+    _inquirer_py_available = True
+except ImportError:
+    _inquirer_py_available = False
+
+SortingOption_T = Literal["alphabetical", "lastUpdated", "lastUsed", "size"]
+_CANCEL_DELETION_STR = "CANCEL_DELETION"
+
+
+def require_inquirer_py(fn: Callable) -> Callable:
+    @wraps(fn)
+    def _inner(*args, **kwargs):
+        if not _inquirer_py_available:
+            raise ImportError(
+                "The 'cache delete' command requires extra dependencies for the TUI.\n"
+                "Please run 'pip install \"huggingface_hub[cli]\"' to install them.\n"
+                "Otherwise, disable TUI using the '--disable-tui' flag."
+            )
+        return fn(*args, **kwargs)
+
+    return _inner
+
+
+class CacheCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        cache_parser = parser.add_parser("cache", help="Manage local cache directory.")
+        cache_subparsers = cache_parser.add_subparsers(dest="cache_command", help="Cache subcommands")
+
+        # Show help if no subcommand is provided
+        cache_parser.set_defaults(func=lambda args: cache_parser.print_help())
+
+        # Scan subcommand
+        scan_parser = cache_subparsers.add_parser("scan", help="Scan cache directory.")
+        scan_parser.add_argument(
+            "--dir",
+            type=str,
+            default=None,
+            help="cache directory to scan (optional). Default to the default HuggingFace cache.",
+        )
+        scan_parser.add_argument(
+            "-v",
+            "--verbose",
+            action="count",
+            default=0,
+            help="show a more verbose output",
+        )
+        scan_parser.set_defaults(func=CacheCommand, cache_command="scan")
+        # Delete subcommand
+        delete_parser = cache_subparsers.add_parser("delete", help="Delete revisions from the cache directory.")
+        delete_parser.add_argument(
+            "--dir",
+            type=str,
+            default=None,
+            help="cache directory (optional). Default to the default HuggingFace cache.",
+        )
+        delete_parser.add_argument(
+            "--disable-tui",
+            action="store_true",
+            help=(
+                "Disable Terminal User Interface (TUI) mode. Useful if your platform/terminal doesn't support the multiselect menu."
+            ),
+        )
+        delete_parser.add_argument(
+            "--sort",
+            nargs="?",
+            choices=["alphabetical", "lastUpdated", "lastUsed", "size"],
+            help=(
+                "Sort repositories by the specified criteria. Options: "
+                "'alphabetical' (A-Z), "
+                "'lastUpdated' (newest first), "
+                "'lastUsed' (most recent first), "
+                "'size' (largest first)."
+            ),
+        )
+        delete_parser.set_defaults(func=CacheCommand, cache_command="delete")
+
+    def __init__(self, args: Namespace) -> None:
+        self.args = args
+        self.verbosity: int = getattr(args, "verbose", 0)
+        self.cache_dir: Optional[str] = getattr(args, "dir", None)
+        self.disable_tui: bool = getattr(args, "disable_tui", False)
+        self.sort_by: Optional[SortingOption_T] = getattr(args, "sort", None)
+        self.cache_command: Optional[str] = getattr(args, "cache_command", None)
+
+    def run(self):
+        if self.cache_command == "scan":
+            self._run_scan()
+        elif self.cache_command == "delete":
+            self._run_delete()
+        else:
+            print("Please specify a cache subcommand (scan or delete). Use -h for help.")
+
+    def _run_scan(self):
+        try:
+            t0 = time.time()
+            hf_cache_info = scan_cache_dir(self.cache_dir)
+            t1 = time.time()
+        except CacheNotFound as exc:
+            cache_dir = exc.cache_dir
+            print(f"Cache directory not found: {cache_dir}")
+            return
+        print(get_table(hf_cache_info, verbosity=self.verbosity))
+        print(
+            f"\nDone in {round(t1 - t0, 1)}s. Scanned {len(hf_cache_info.repos)} repo(s)"
+            f" for a total of {ANSI.red(hf_cache_info.size_on_disk_str)}."
+        )
+        if len(hf_cache_info.warnings) > 0:
+            message = f"Got {len(hf_cache_info.warnings)} warning(s) while scanning."
+            if self.verbosity >= 3:
+                print(ANSI.gray(message))
+                for warning in hf_cache_info.warnings:
+                    print(ANSI.gray(str(warning)))
+            else:
+                print(ANSI.gray(message + " Use -vvv to print details."))
+
+    def _run_delete(self):
+        hf_cache_info = scan_cache_dir(self.cache_dir)
+        if self.disable_tui:
+            selected_hashes = _manual_review_no_tui(hf_cache_info, preselected=[], sort_by=self.sort_by)
+        else:
+            selected_hashes = _manual_review_tui(hf_cache_info, preselected=[], sort_by=self.sort_by)
+        if len(selected_hashes) > 0 and _CANCEL_DELETION_STR not in selected_hashes:
+            confirm_message = _get_expectations_str(hf_cache_info, selected_hashes) + " Confirm deletion ?"
+            if self.disable_tui:
+                confirmed = _ask_for_confirmation_no_tui(confirm_message)
+            else:
+                confirmed = _ask_for_confirmation_tui(confirm_message)
+            if confirmed:
+                strategy = hf_cache_info.delete_revisions(*selected_hashes)
+                print("Start deletion.")
+                strategy.execute()
+                print(
+                    f"Done. Deleted {len(strategy.repos)} repo(s) and"
+                    f" {len(strategy.snapshots)} revision(s) for a total of"
+                    f" {strategy.expected_freed_size_str}."
+                )
+                return
+        print("Deletion is cancelled. Do nothing.")
+
+
+def get_table(hf_cache_info: HFCacheInfo, *, verbosity: int = 0) -> str:
+    if verbosity == 0:
+        return tabulate(
+            rows=[
+                [
+                    repo.repo_id,
+                    repo.repo_type,
+                    "{:>12}".format(repo.size_on_disk_str),
+                    repo.nb_files,
+                    repo.last_accessed_str,
+                    repo.last_modified_str,
+                    ", ".join(sorted(repo.refs)),
+                    str(repo.repo_path),
+                ]
+                for repo in sorted(hf_cache_info.repos, key=lambda repo: repo.repo_path)
+            ],
+            headers=[
+                "REPO ID",
+                "REPO TYPE",
+                "SIZE ON DISK",
+                "NB FILES",
+                "LAST_ACCESSED",
+                "LAST_MODIFIED",
+                "REFS",
+                "LOCAL PATH",
+            ],
+        )
+    else:
+        return tabulate(
+            rows=[
+                [
+                    repo.repo_id,
+                    repo.repo_type,
+                    revision.commit_hash,
+                    "{:>12}".format(revision.size_on_disk_str),
+                    revision.nb_files,
+                    revision.last_modified_str,
+                    ", ".join(sorted(revision.refs)),
+                    str(revision.snapshot_path),
+                ]
+                for repo in sorted(hf_cache_info.repos, key=lambda repo: repo.repo_path)
+                for revision in sorted(repo.revisions, key=lambda revision: revision.commit_hash)
+            ],
+            headers=[
+                "REPO ID",
+                "REPO TYPE",
+                "REVISION",
+                "SIZE ON DISK",
+                "NB FILES",
+                "LAST_MODIFIED",
+                "REFS",
+                "LOCAL PATH",
+            ],
+        )
+
+
+def _get_repo_sorting_key(repo: CachedRepoInfo, sort_by: Optional[SortingOption_T] = None):
+    if sort_by == "alphabetical":
+        return (repo.repo_type, repo.repo_id.lower())
+    elif sort_by == "lastUpdated":
+        return -max(rev.last_modified for rev in repo.revisions)
+    elif sort_by == "lastUsed":
+        return -repo.last_accessed
+    elif sort_by == "size":
+        return -repo.size_on_disk
+    else:
+        return (repo.repo_type, repo.repo_id)
+
+
+@require_inquirer_py
+def _manual_review_tui(
+    hf_cache_info: HFCacheInfo, preselected: List[str], sort_by: Optional[SortingOption_T] = None
+) -> List[str]:
+    choices = _get_tui_choices_from_scan(repos=hf_cache_info.repos, preselected=preselected, sort_by=sort_by)
+    checkbox = inquirer.checkbox(
+        message="Select revisions to delete:",
+        choices=choices,
+        cycle=False,
+        height=100,
+        instruction=_get_expectations_str(
+            hf_cache_info, selected_hashes=[c.value for c in choices if isinstance(c, Choice) and c.enabled]
+        ),
+        long_instruction="Press <space> to select, <enter> to validate and <ctrl+c> to quit without modification.",
+        transformer=lambda result: f"{len(result)} revision(s) selected.",
+    )
+
+    def _update_expectations(_):
+        checkbox._instruction = _get_expectations_str(
+            hf_cache_info,
+            selected_hashes=[choice["value"] for choice in checkbox.content_control.choices if choice["enabled"]],
+        )
+
+    checkbox.kb_func_lookup["toggle"].append({"func": _update_expectations})
+    try:
+        return checkbox.execute()
+    except KeyboardInterrupt:
+        return []
+
+
+@require_inquirer_py
+def _ask_for_confirmation_tui(message: str, default: bool = True) -> bool:
+    return inquirer.confirm(message, default=default).execute()
+
+
+def _get_tui_choices_from_scan(
+    repos: Iterable[CachedRepoInfo], preselected: List[str], sort_by: Optional[SortingOption_T] = None
+) -> List:
+    choices: List[Union["Choice", "Separator"]] = []
+    choices.append(
+        Choice(
+            _CANCEL_DELETION_STR, name="None of the following (if selected, nothing will be deleted).", enabled=False
+        )
+    )
+    sorted_repos = sorted(repos, key=lambda repo: _get_repo_sorting_key(repo, sort_by))
+    for repo in sorted_repos:
+        choices.append(
+            Separator(
+                f"\n{repo.repo_type.capitalize()} {repo.repo_id} ({repo.size_on_disk_str}, used {repo.last_accessed_str})"
+            )
+        )
+        for revision in sorted(repo.revisions, key=_revision_sorting_order):
+            choices.append(
+                Choice(
+                    revision.commit_hash,
+                    name=(
+                        f"{revision.commit_hash[:8]}: {', '.join(sorted(revision.refs)) or '(detached)'} # modified {revision.last_modified_str}"
+                    ),
+                    enabled=revision.commit_hash in preselected,
+                )
+            )
+    return choices
+
+
+def _manual_review_no_tui(
+    hf_cache_info: HFCacheInfo, preselected: List[str], sort_by: Optional[SortingOption_T] = None
+) -> List[str]:
+    fd, tmp_path = mkstemp(suffix=".txt")
+    os.close(fd)
+    lines = []
+    sorted_repos = sorted(hf_cache_info.repos, key=lambda repo: _get_repo_sorting_key(repo, sort_by))
+    for repo in sorted_repos:
+        lines.append(
+            f"\n# {repo.repo_type.capitalize()} {repo.repo_id} ({repo.size_on_disk_str}, used {repo.last_accessed_str})"
+        )
+        for revision in sorted(repo.revisions, key=_revision_sorting_order):
+            lines.append(
+                f"{'' if revision.commit_hash in preselected else '#'}   {revision.commit_hash} # Refs: {', '.join(sorted(revision.refs)) or '(detached)'} # modified {revision.last_modified_str}"
+            )
+    with open(tmp_path, "w") as f:
+        f.write(_MANUAL_REVIEW_NO_TUI_INSTRUCTIONS)
+        f.write("\n".join(lines))
+    instructions = f"""
+    TUI is disabled. In order to select which revisions you want to delete, please edit
+    the following file using the text editor of your choice. Instructions for manual
+    editing are located at the beginning of the file. Edit the file, save it and confirm
+    to continue.
+    File to edit: {ANSI.bold(tmp_path)}
+    """
+    print("\n".join(line.strip() for line in instructions.strip().split("\n")))
+    while True:
+        selected_hashes = _read_manual_review_tmp_file(tmp_path)
+        if _ask_for_confirmation_no_tui(
+            _get_expectations_str(hf_cache_info, selected_hashes) + " Continue ?", default=False
+        ):
+            break
+    os.remove(tmp_path)
+    return sorted(selected_hashes)
+
+
+def _ask_for_confirmation_no_tui(message: str, default: bool = True) -> bool:
+    YES = ("y", "yes", "1")
+    NO = ("n", "no", "0")
+    DEFAULT = ""
+    ALL = YES + NO + (DEFAULT,)
+    full_message = message + (" (Y/n) " if default else " (y/N) ")
+    while True:
+        answer = input(full_message).lower()
+        if answer == DEFAULT:
+            return default
+        if answer in YES:
+            return True
+        if answer in NO:
+            return False
+        print(f"Invalid input. Must be one of {ALL}")
+
+
+def _get_expectations_str(hf_cache_info: HFCacheInfo, selected_hashes: List[str]) -> str:
+    if _CANCEL_DELETION_STR in selected_hashes:
+        return "Nothing will be deleted."
+    strategy = hf_cache_info.delete_revisions(*selected_hashes)
+    return f"{len(selected_hashes)} revisions selected counting for {strategy.expected_freed_size_str}."
+
+
+def _read_manual_review_tmp_file(tmp_path: str) -> List[str]:
+    with open(tmp_path) as f:
+        content = f.read()
+    lines = [line.strip() for line in content.split("\n")]
+    selected_lines = [line for line in lines if not line.startswith("#")]
+    selected_hashes = [line.split("#")[0].strip() for line in selected_lines]
+    return [hash for hash in selected_hashes if len(hash) > 0]
+
+
+_MANUAL_REVIEW_NO_TUI_INSTRUCTIONS = f"""
+# INSTRUCTIONS
+# ------------
+# This is a temporary file created by running `hf cache delete --disable-tui`. It contains a set of revisions that can be deleted from your local cache directory.
+#
+# Please manually review the revisions you want to delete:
+#   - Revision hashes can be commented out with '#'.
+#   - Only non-commented revisions in this file will be deleted.
+#   - Revision hashes that are removed from this file are ignored as well.
+#   - If `{_CANCEL_DELETION_STR}` line is uncommented, the all cache deletion is cancelled and no changes will be applied.
+#
+# Once you've manually reviewed this file, please confirm deletion in the terminal. This file will be automatically removed once done.
+# ------------
+
+# KILL SWITCH
+# ------------
+# Un-comment following line to completely cancel the deletion process
+# {_CANCEL_DELETION_STR}
+# ------------
+
+# REVISIONS
+# ------------
+""".strip()
+
+
+def _revision_sorting_order(revision: CachedRevisionInfo) -> Any:
+    return revision.last_modified
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/download.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/download.py
new file mode 100644
index 0000000000000000000000000000000000000000..2660644e62955952f010701a823d7a8bdce1803b
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/download.py
@@ -0,0 +1,181 @@
+# coding=utf-8
+# Copyright 202-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to download files from the Hub with the CLI.
+
+Usage:
+    hf download --help
+
+    # Download file
+    hf download gpt2 config.json
+
+    # Download entire repo
+    hf download fffiloni/zeroscope --repo-type=space --revision=refs/pr/78
+
+    # Download repo with filters
+    hf download gpt2 --include="*.safetensors"
+
+    # Download with token
+    hf download Wauplin/private-model --token=hf_***
+
+    # Download quietly (no progress bar, no warnings, only the returned path)
+    hf download gpt2 config.json --quiet
+
+    # Download to local dir
+    hf download gpt2 --local-dir=./models/gpt2
+"""
+
+import warnings
+from argparse import Namespace, _SubParsersAction
+from typing import List, Optional
+
+from huggingface_hub import logging
+from huggingface_hub._snapshot_download import snapshot_download
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.file_download import hf_hub_download
+from huggingface_hub.utils import disable_progress_bars, enable_progress_bars
+
+
+logger = logging.get_logger(__name__)
+
+
+class DownloadCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        download_parser = parser.add_parser("download", help="Download files from the Hub")
+        download_parser.add_argument(
+            "repo_id", type=str, help="ID of the repo to download from (e.g. `username/repo-name`)."
+        )
+        download_parser.add_argument(
+            "filenames", type=str, nargs="*", help="Files to download (e.g. `config.json`, `data/metadata.jsonl`)."
+        )
+        download_parser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            default="model",
+            help="Type of repo to download from (defaults to 'model').",
+        )
+        download_parser.add_argument(
+            "--revision",
+            type=str,
+            help="An optional Git revision id which can be a branch name, a tag, or a commit hash.",
+        )
+        download_parser.add_argument(
+            "--include", nargs="*", type=str, help="Glob patterns to match files to download."
+        )
+        download_parser.add_argument(
+            "--exclude", nargs="*", type=str, help="Glob patterns to exclude from files to download."
+        )
+        download_parser.add_argument(
+            "--cache-dir", type=str, help="Path to the directory where to save the downloaded files."
+        )
+        download_parser.add_argument(
+            "--local-dir",
+            type=str,
+            help=(
+                "If set, the downloaded file will be placed under this directory. Check out"
+                " https://huggingface.co/docs/huggingface_hub/guides/download#download-files-to-local-folder for more"
+                " details."
+            ),
+        )
+        download_parser.add_argument(
+            "--force-download",
+            action="store_true",
+            help="If True, the files will be downloaded even if they are already cached.",
+        )
+        download_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        download_parser.add_argument(
+            "--quiet",
+            action="store_true",
+            help="If True, progress bars are disabled and only the path to the download files is printed.",
+        )
+        download_parser.add_argument(
+            "--max-workers",
+            type=int,
+            default=8,
+            help="Maximum number of workers to use for downloading files. Default is 8.",
+        )
+        download_parser.set_defaults(func=DownloadCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.token = args.token
+        self.repo_id: str = args.repo_id
+        self.filenames: List[str] = args.filenames
+        self.repo_type: str = args.repo_type
+        self.revision: Optional[str] = args.revision
+        self.include: Optional[List[str]] = args.include
+        self.exclude: Optional[List[str]] = args.exclude
+        self.cache_dir: Optional[str] = args.cache_dir
+        self.local_dir: Optional[str] = args.local_dir
+        self.force_download: bool = args.force_download
+        self.quiet: bool = args.quiet
+        self.max_workers: int = args.max_workers
+
+    def run(self) -> None:
+        if self.quiet:
+            disable_progress_bars()
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore")
+                print(self._download())  # Print path to downloaded files
+            enable_progress_bars()
+        else:
+            logging.set_verbosity_info()
+            print(self._download())  # Print path to downloaded files
+            logging.set_verbosity_warning()
+
+    def _download(self) -> str:
+        # Warn user if patterns are ignored
+        if len(self.filenames) > 0:
+            if self.include is not None and len(self.include) > 0:
+                warnings.warn("Ignoring `--include` since filenames have being explicitly set.")
+            if self.exclude is not None and len(self.exclude) > 0:
+                warnings.warn("Ignoring `--exclude` since filenames have being explicitly set.")
+
+        # Single file to download: use `hf_hub_download`
+        if len(self.filenames) == 1:
+            return hf_hub_download(
+                repo_id=self.repo_id,
+                repo_type=self.repo_type,
+                revision=self.revision,
+                filename=self.filenames[0],
+                cache_dir=self.cache_dir,
+                force_download=self.force_download,
+                token=self.token,
+                local_dir=self.local_dir,
+                library_name="huggingface-cli",
+            )
+
+        # Otherwise: use `snapshot_download` to ensure all files comes from same revision
+        elif len(self.filenames) == 0:
+            allow_patterns = self.include
+            ignore_patterns = self.exclude
+        else:
+            allow_patterns = self.filenames
+            ignore_patterns = None
+
+        return snapshot_download(
+            repo_id=self.repo_id,
+            repo_type=self.repo_type,
+            revision=self.revision,
+            allow_patterns=allow_patterns,
+            ignore_patterns=ignore_patterns,
+            force_download=self.force_download,
+            cache_dir=self.cache_dir,
+            token=self.token,
+            local_dir=self.local_dir,
+            library_name="huggingface-cli",
+            max_workers=self.max_workers,
+        )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/hf.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/hf.py
new file mode 100644
index 0000000000000000000000000000000000000000..2587918b294b427fb8f3e0f990884826b66514a8
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/hf.py
@@ -0,0 +1,63 @@
+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from argparse import ArgumentParser
+
+from huggingface_hub.cli.auth import AuthCommands
+from huggingface_hub.cli.cache import CacheCommand
+from huggingface_hub.cli.download import DownloadCommand
+from huggingface_hub.cli.jobs import JobsCommands
+from huggingface_hub.cli.lfs import LfsCommands
+from huggingface_hub.cli.repo import RepoCommands
+from huggingface_hub.cli.repo_files import RepoFilesCommand
+from huggingface_hub.cli.system import EnvironmentCommand, VersionCommand
+from huggingface_hub.cli.upload import UploadCommand
+from huggingface_hub.cli.upload_large_folder import UploadLargeFolderCommand
+
+
+def main():
+    parser = ArgumentParser("hf", usage="hf <command> [<args>]")
+    commands_parser = parser.add_subparsers(help="hf command helpers")
+
+    # Register commands
+    AuthCommands.register_subcommand(commands_parser)
+    CacheCommand.register_subcommand(commands_parser)
+    DownloadCommand.register_subcommand(commands_parser)
+    JobsCommands.register_subcommand(commands_parser)
+    RepoCommands.register_subcommand(commands_parser)
+    RepoFilesCommand.register_subcommand(commands_parser)
+    UploadCommand.register_subcommand(commands_parser)
+    UploadLargeFolderCommand.register_subcommand(commands_parser)
+
+    # System commands
+    EnvironmentCommand.register_subcommand(commands_parser)
+    VersionCommand.register_subcommand(commands_parser)
+
+    # LFS commands (hidden in --help)
+    LfsCommands.register_subcommand(commands_parser)
+
+    # Let's go
+    args = parser.parse_args()
+    if not hasattr(args, "func"):
+        parser.print_help()
+        exit(1)
+
+    # Run
+    service = args.func(args)
+    if service is not None:
+        service.run()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/jobs.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/jobs.py
new file mode 100644
index 0000000000000000000000000000000000000000..3a661c7df7d65813dbb1b2a8f449ca8410e320e0
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/jobs.py
@@ -0,0 +1,1100 @@
+# Copyright 2025 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains commands to interact with jobs on the Hugging Face Hub.
+
+Usage:
+    # run a job
+    hf jobs run <image> <command>
+
+    # List running or completed jobs
+    hf jobs ps [-a] [-f key=value] [--format TEMPLATE]
+
+    # Stream logs from a job
+    hf jobs logs <job-id>
+
+    # Inspect detailed information about a job
+    hf jobs inspect <job-id>
+
+    # Cancel a running job
+    hf jobs cancel <job-id>
+"""
+
+import json
+import os
+import re
+from argparse import Namespace, _SubParsersAction
+from dataclasses import asdict
+from pathlib import Path
+from typing import Dict, List, Optional, Union
+
+import requests
+
+from huggingface_hub import HfApi, SpaceHardware, get_token
+from huggingface_hub.utils import logging
+from huggingface_hub.utils._dotenv import load_dotenv
+
+from . import BaseHuggingfaceCLICommand
+
+
+logger = logging.get_logger(__name__)
+
+SUGGESTED_FLAVORS = [item.value for item in SpaceHardware if item.value != "zero-a10g"]
+
+
+class JobsCommands(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        jobs_parser = parser.add_parser("jobs", help="Run and manage Jobs on the Hub.")
+        jobs_subparsers = jobs_parser.add_subparsers(help="huggingface.co jobs related commands")
+
+        # Show help if no subcommand is provided
+        jobs_parser.set_defaults(func=lambda args: jobs_parser.print_help())
+
+        # Register commands
+        InspectCommand.register_subcommand(jobs_subparsers)
+        LogsCommand.register_subcommand(jobs_subparsers)
+        PsCommand.register_subcommand(jobs_subparsers)
+        RunCommand.register_subcommand(jobs_subparsers)
+        CancelCommand.register_subcommand(jobs_subparsers)
+        UvCommand.register_subcommand(jobs_subparsers)
+        ScheduledJobsCommands.register_subcommand(jobs_subparsers)
+
+
+class RunCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("run", help="Run a Job")
+        run_parser.add_argument("image", type=str, help="The Docker image to use.")
+        run_parser.add_argument("-e", "--env", action="append", help="Set environment variables. E.g. --env ENV=value")
+        run_parser.add_argument(
+            "-s",
+            "--secrets",
+            action="append",
+            help=(
+                "Set secret environment variables. E.g. --secrets SECRET=value "
+                "or `--secrets HF_TOKEN` to pass your Hugging Face token."
+            ),
+        )
+        run_parser.add_argument("--env-file", type=str, help="Read in a file of environment variables.")
+        run_parser.add_argument("--secrets-file", type=str, help="Read in a file of secret environment variables.")
+        run_parser.add_argument(
+            "--flavor",
+            type=str,
+            help=f"Flavor for the hardware, as in HF Spaces. Defaults to `cpu-basic`. Possible values: {', '.join(SUGGESTED_FLAVORS)}.",
+        )
+        run_parser.add_argument(
+            "--timeout",
+            type=str,
+            help="Max duration: int/float with s (seconds, default), m (minutes), h (hours) or d (days).",
+        )
+        run_parser.add_argument(
+            "-d",
+            "--detach",
+            action="store_true",
+            help="Run the Job in the background and print the Job ID.",
+        )
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the Job will be created. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token",
+            type=str,
+            help="A User Access Token generated from https://huggingface.co/settings/tokens",
+        )
+        run_parser.add_argument("command", nargs="...", help="The command to run.")
+        run_parser.set_defaults(func=RunCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.image: str = args.image
+        self.command: List[str] = args.command
+        self.env: dict[str, Optional[str]] = {}
+        if args.env_file:
+            self.env.update(load_dotenv(Path(args.env_file).read_text(), environ=os.environ.copy()))
+        for env_value in args.env or []:
+            self.env.update(load_dotenv(env_value, environ=os.environ.copy()))
+        self.secrets: dict[str, Optional[str]] = {}
+        extended_environ = _get_extended_environ()
+        if args.secrets_file:
+            self.secrets.update(load_dotenv(Path(args.secrets_file).read_text(), environ=extended_environ))
+        for secret in args.secrets or []:
+            self.secrets.update(load_dotenv(secret, environ=extended_environ))
+        self.flavor: Optional[SpaceHardware] = args.flavor
+        self.timeout: Optional[str] = args.timeout
+        self.detach: bool = args.detach
+        self.namespace: Optional[str] = args.namespace
+        self.token: Optional[str] = args.token
+
+    def run(self) -> None:
+        api = HfApi(token=self.token)
+        job = api.run_job(
+            image=self.image,
+            command=self.command,
+            env=self.env,
+            secrets=self.secrets,
+            flavor=self.flavor,
+            timeout=self.timeout,
+            namespace=self.namespace,
+        )
+        # Always print the job ID to the user
+        print(f"Job started with ID: {job.id}")
+        print(f"View at: {job.url}")
+
+        if self.detach:
+            return
+
+        # Now let's stream the logs
+        for log in api.fetch_job_logs(job_id=job.id):
+            print(log)
+
+
+class LogsCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("logs", help="Fetch the logs of a Job")
+        run_parser.add_argument("job_id", type=str, help="Job ID")
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the job is running. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        run_parser.set_defaults(func=LogsCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.job_id: str = args.job_id
+        self.namespace: Optional[str] = args.namespace
+        self.token: Optional[str] = args.token
+
+    def run(self) -> None:
+        api = HfApi(token=self.token)
+        for log in api.fetch_job_logs(job_id=self.job_id, namespace=self.namespace):
+            print(log)
+
+
+def _tabulate(rows: List[List[Union[str, int]]], headers: List[str]) -> str:
+    """
+    Inspired by:
+
+    - stackoverflow.com/a/8356620/593036
+    - stackoverflow.com/questions/9535954/printing-lists-as-tabular-data
+    """
+    col_widths = [max(len(str(x)) for x in col) for col in zip(*rows, headers)]
+    terminal_width = max(os.get_terminal_size().columns, len(headers) * 12)
+    while len(headers) + sum(col_widths) > terminal_width:
+        col_to_minimize = col_widths.index(max(col_widths))
+        col_widths[col_to_minimize] //= 2
+        if len(headers) + sum(col_widths) <= terminal_width:
+            col_widths[col_to_minimize] = terminal_width - sum(col_widths) - len(headers) + col_widths[col_to_minimize]
+    row_format = ("{{:{}}} " * len(headers)).format(*col_widths)
+    lines = []
+    lines.append(row_format.format(*headers))
+    lines.append(row_format.format(*["-" * w for w in col_widths]))
+    for row in rows:
+        row_format_args = [
+            str(x)[: col_width - 3] + "..." if len(str(x)) > col_width else str(x)
+            for x, col_width in zip(row, col_widths)
+        ]
+        lines.append(row_format.format(*row_format_args))
+    return "\n".join(lines)
+
+
+class PsCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("ps", help="List Jobs")
+        run_parser.add_argument(
+            "-a",
+            "--all",
+            action="store_true",
+            help="Show all Jobs (default shows just running)",
+        )
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace from where it lists the jobs. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token",
+            type=str,
+            help="A User Access Token generated from https://huggingface.co/settings/tokens",
+        )
+        # Add Docker-style filtering argument
+        run_parser.add_argument(
+            "-f",
+            "--filter",
+            action="append",
+            default=[],
+            help="Filter output based on conditions provided (format: key=value)",
+        )
+        # Add option to format output
+        run_parser.add_argument(
+            "--format",
+            type=str,
+            help="Format output using a custom template",
+        )
+        run_parser.set_defaults(func=PsCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.all: bool = args.all
+        self.namespace: Optional[str] = args.namespace
+        self.token: Optional[str] = args.token
+        self.format: Optional[str] = args.format
+        self.filters: Dict[str, str] = {}
+
+        # Parse filter arguments (key=value pairs)
+        for f in args.filter:
+            if "=" in f:
+                key, value = f.split("=", 1)
+                self.filters[key.lower()] = value
+            else:
+                print(f"Warning: Ignoring invalid filter format '{f}'. Use key=value format.")
+
+    def run(self) -> None:
+        """
+        Fetch and display job information for the current user.
+        Uses Docker-style filtering with -f/--filter flag and key=value pairs.
+        """
+        try:
+            api = HfApi(token=self.token)
+
+            # Fetch jobs data
+            jobs = api.list_jobs(namespace=self.namespace)
+
+            # Define table headers
+            table_headers = ["JOB ID", "IMAGE/SPACE", "COMMAND", "CREATED", "STATUS"]
+
+            # Process jobs data
+            rows = []
+
+            for job in jobs:
+                # Extract job data for filtering
+                status = job.status.stage if job.status else "UNKNOWN"
+
+                # Skip job if not all jobs should be shown and status doesn't match criteria
+                if not self.all and status not in ("RUNNING", "UPDATING"):
+                    continue
+
+                # Extract job ID
+                job_id = job.id
+
+                # Extract image or space information
+                image_or_space = job.docker_image or "N/A"
+
+                # Extract and format command
+                command = job.command or []
+                command_str = " ".join(command) if command else "N/A"
+
+                # Extract creation time
+                created_at = job.created_at.strftime("%Y-%m-%d %H:%M:%S") if job.created_at else "N/A"
+
+                # Create a dict with all job properties for filtering
+                job_properties = {
+                    "id": job_id,
+                    "image": image_or_space,
+                    "status": status.lower(),
+                    "command": command_str,
+                }
+
+                # Check if job matches all filters
+                if not self._matches_filters(job_properties):
+                    continue
+
+                # Create row
+                rows.append([job_id, image_or_space, command_str, created_at, status])
+
+            # Handle empty results
+            if not rows:
+                filters_msg = ""
+                if self.filters:
+                    filters_msg = f" matching filters: {', '.join([f'{k}={v}' for k, v in self.filters.items()])}"
+
+                print(f"No jobs found{filters_msg}")
+                return
+
+            # Apply custom format if provided or use default tabular format
+            self._print_output(rows, table_headers)
+
+        except requests.RequestException as e:
+            print(f"Error fetching jobs data: {e}")
+        except (KeyError, ValueError, TypeError) as e:
+            print(f"Error processing jobs data: {e}")
+        except Exception as e:
+            print(f"Unexpected error - {type(e).__name__}: {e}")
+
+    def _matches_filters(self, job_properties: Dict[str, str]) -> bool:
+        """Check if job matches all specified filters."""
+        for key, pattern in self.filters.items():
+            # Check if property exists
+            if key not in job_properties:
+                return False
+
+            # Support pattern matching with wildcards
+            if "*" in pattern or "?" in pattern:
+                # Convert glob pattern to regex
+                regex_pattern = pattern.replace("*", ".*").replace("?", ".")
+                if not re.search(f"^{regex_pattern}$", job_properties[key], re.IGNORECASE):
+                    return False
+            # Simple substring matching
+            elif pattern.lower() not in job_properties[key].lower():
+                return False
+
+        return True
+
+    def _print_output(self, rows, headers):
+        """Print output according to the chosen format."""
+        if self.format:
+            # Custom template formatting (simplified)
+            template = self.format
+            for row in rows:
+                line = template
+                for i, field in enumerate(["id", "image", "command", "created", "status"]):
+                    placeholder = f"{{{{.{field}}}}}"
+                    if placeholder in line:
+                        line = line.replace(placeholder, str(row[i]))
+                print(line)
+        else:
+            # Default tabular format
+            print(
+                _tabulate(
+                    rows,
+                    headers=headers,
+                )
+            )
+
+
+class InspectCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("inspect", help="Display detailed information on one or more Jobs")
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the job is running. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        run_parser.add_argument("job_ids", nargs="...", help="The jobs to inspect")
+        run_parser.set_defaults(func=InspectCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.namespace: Optional[str] = args.namespace
+        self.token: Optional[str] = args.token
+        self.job_ids: List[str] = args.job_ids
+
+    def run(self) -> None:
+        api = HfApi(token=self.token)
+        jobs = [api.inspect_job(job_id=job_id, namespace=self.namespace) for job_id in self.job_ids]
+        print(json.dumps([asdict(job) for job in jobs], indent=4, default=str))
+
+
+class CancelCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("cancel", help="Cancel a Job")
+        run_parser.add_argument("job_id", type=str, help="Job ID")
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the job is running. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        run_parser.set_defaults(func=CancelCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.job_id: str = args.job_id
+        self.namespace = args.namespace
+        self.token: Optional[str] = args.token
+
+    def run(self) -> None:
+        api = HfApi(token=self.token)
+        api.cancel_job(job_id=self.job_id, namespace=self.namespace)
+
+
+class UvCommand(BaseHuggingfaceCLICommand):
+    """Run UV scripts on Hugging Face infrastructure."""
+
+    @staticmethod
+    def register_subcommand(parser):
+        """Register UV run subcommand."""
+        uv_parser = parser.add_parser(
+            "uv",
+            help="Run UV scripts (Python with inline dependencies) on HF infrastructure",
+        )
+
+        subparsers = uv_parser.add_subparsers(dest="uv_command", help="UV commands", required=True)
+
+        # Run command only
+        run_parser = subparsers.add_parser(
+            "run",
+            help="Run a UV script (local file or URL) on HF infrastructure",
+        )
+        run_parser.add_argument("script", help="UV script to run (local file or URL)")
+        run_parser.add_argument("script_args", nargs="...", help="Arguments for the script", default=[])
+        run_parser.add_argument("--image", type=str, help="Use a custom Docker image with `uv` installed.")
+        run_parser.add_argument(
+            "--repo",
+            help="Repository name for the script (creates ephemeral if not specified)",
+        )
+        run_parser.add_argument(
+            "--flavor",
+            type=str,
+            help=f"Flavor for the hardware, as in HF Spaces. Defaults to `cpu-basic`. Possible values: {', '.join(SUGGESTED_FLAVORS)}.",
+        )
+        run_parser.add_argument("-e", "--env", action="append", help="Environment variables")
+        run_parser.add_argument(
+            "-s",
+            "--secrets",
+            action="append",
+            help=(
+                "Set secret environment variables. E.g. --secrets SECRET=value "
+                "or `--secrets HF_TOKEN` to pass your Hugging Face token."
+            ),
+        )
+        run_parser.add_argument("--env-file", type=str, help="Read in a file of environment variables.")
+        run_parser.add_argument(
+            "--secrets-file",
+            type=str,
+            help="Read in a file of secret environment variables.",
+        )
+        run_parser.add_argument("--timeout", type=str, help="Max duration (e.g., 30s, 5m, 1h)")
+        run_parser.add_argument("-d", "--detach", action="store_true", help="Run in background")
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the Job will be created. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument("--token", type=str, help="HF token")
+        # UV options
+        run_parser.add_argument("--with", action="append", help="Run with the given packages installed", dest="with_")
+        run_parser.add_argument(
+            "-p", "--python", type=str, help="The Python interpreter to use for the run environment"
+        )
+        run_parser.set_defaults(func=UvCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        """Initialize the command with parsed arguments."""
+        self.script = args.script
+        self.script_args = args.script_args
+        self.dependencies = args.with_
+        self.python = args.python
+        self.image = args.image
+        self.env: dict[str, Optional[str]] = {}
+        if args.env_file:
+            self.env.update(load_dotenv(Path(args.env_file).read_text(), environ=os.environ.copy()))
+        for env_value in args.env or []:
+            self.env.update(load_dotenv(env_value, environ=os.environ.copy()))
+        self.secrets: dict[str, Optional[str]] = {}
+        extended_environ = _get_extended_environ()
+        if args.secrets_file:
+            self.secrets.update(load_dotenv(Path(args.secrets_file).read_text(), environ=extended_environ))
+        for secret in args.secrets or []:
+            self.secrets.update(load_dotenv(secret, environ=extended_environ))
+        self.flavor: Optional[SpaceHardware] = args.flavor
+        self.timeout: Optional[str] = args.timeout
+        self.detach: bool = args.detach
+        self.namespace: Optional[str] = args.namespace
+        self.token: Optional[str] = args.token
+        self._repo = args.repo
+
+    def run(self) -> None:
+        """Execute UV command."""
+        logging.set_verbosity(logging.INFO)
+        api = HfApi(token=self.token)
+        job = api.run_uv_job(
+            script=self.script,
+            script_args=self.script_args,
+            dependencies=self.dependencies,
+            python=self.python,
+            image=self.image,
+            env=self.env,
+            secrets=self.secrets,
+            flavor=self.flavor,
+            timeout=self.timeout,
+            namespace=self.namespace,
+            _repo=self._repo,
+        )
+
+        # Always print the job ID to the user
+        print(f"Job started with ID: {job.id}")
+        print(f"View at: {job.url}")
+
+        if self.detach:
+            return
+
+        # Now let's stream the logs
+        for log in api.fetch_job_logs(job_id=job.id):
+            print(log)
+
+
+def _get_extended_environ() -> Dict[str, str]:
+    extended_environ = os.environ.copy()
+    if (token := get_token()) is not None:
+        extended_environ["HF_TOKEN"] = token
+    return extended_environ
+
+
+class ScheduledJobsCommands(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        scheduled_jobs_parser = parser.add_parser("scheduled", help="Create and manage scheduled Jobs on the Hub.")
+        scheduled_jobs_subparsers = scheduled_jobs_parser.add_subparsers(
+            help="huggingface.co scheduled jobs related commands"
+        )
+
+        # Show help if no subcommand is provided
+        scheduled_jobs_parser.set_defaults(func=lambda args: scheduled_jobs_subparsers.print_help())
+
+        # Register commands
+        ScheduledRunCommand.register_subcommand(scheduled_jobs_subparsers)
+        ScheduledPsCommand.register_subcommand(scheduled_jobs_subparsers)
+        ScheduledInspectCommand.register_subcommand(scheduled_jobs_subparsers)
+        ScheduledDeleteCommand.register_subcommand(scheduled_jobs_subparsers)
+        ScheduledSuspendCommand.register_subcommand(scheduled_jobs_subparsers)
+        ScheduledResumeCommand.register_subcommand(scheduled_jobs_subparsers)
+        ScheduledUvCommand.register_subcommand(scheduled_jobs_subparsers)
+
+
+class ScheduledRunCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("run", help="Schedule a Job")
+        run_parser.add_argument(
+            "schedule",
+            type=str,
+            help="One of annually, yearly, monthly, weekly, daily, hourly, or a CRON schedule expression.",
+        )
+        run_parser.add_argument("image", type=str, help="The Docker image to use.")
+        run_parser.add_argument(
+            "--suspend",
+            action="store_true",
+            help="Suspend (pause) the scheduled Job",
+            default=None,
+        )
+        run_parser.add_argument(
+            "--concurrency",
+            action="store_true",
+            help="Allow multiple instances of this Job to run concurrently",
+            default=None,
+        )
+        run_parser.add_argument("-e", "--env", action="append", help="Set environment variables. E.g. --env ENV=value")
+        run_parser.add_argument(
+            "-s",
+            "--secrets",
+            action="append",
+            help=(
+                "Set secret environment variables. E.g. --secrets SECRET=value "
+                "or `--secrets HF_TOKEN` to pass your Hugging Face token."
+            ),
+        )
+        run_parser.add_argument("--env-file", type=str, help="Read in a file of environment variables.")
+        run_parser.add_argument("--secrets-file", type=str, help="Read in a file of secret environment variables.")
+        run_parser.add_argument(
+            "--flavor",
+            type=str,
+            help=f"Flavor for the hardware, as in HF Spaces. Defaults to `cpu-basic`. Possible values: {', '.join(SUGGESTED_FLAVORS)}.",
+        )
+        run_parser.add_argument(
+            "--timeout",
+            type=str,
+            help="Max duration: int/float with s (seconds, default), m (minutes), h (hours) or d (days).",
+        )
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the scheduled Job will be created. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token",
+            type=str,
+            help="A User Access Token generated from https://huggingface.co/settings/tokens",
+        )
+        run_parser.add_argument("command", nargs="...", help="The command to run.")
+        run_parser.set_defaults(func=ScheduledRunCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.schedule: str = args.schedule
+        self.image: str = args.image
+        self.command: List[str] = args.command
+        self.suspend: Optional[bool] = args.suspend
+        self.concurrency: Optional[bool] = args.concurrency
+        self.env: dict[str, Optional[str]] = {}
+        if args.env_file:
+            self.env.update(load_dotenv(Path(args.env_file).read_text(), environ=os.environ.copy()))
+        for env_value in args.env or []:
+            self.env.update(load_dotenv(env_value, environ=os.environ.copy()))
+        self.secrets: dict[str, Optional[str]] = {}
+        extended_environ = _get_extended_environ()
+        if args.secrets_file:
+            self.secrets.update(load_dotenv(Path(args.secrets_file).read_text(), environ=extended_environ))
+        for secret in args.secrets or []:
+            self.secrets.update(load_dotenv(secret, environ=extended_environ))
+        self.flavor: Optional[SpaceHardware] = args.flavor
+        self.timeout: Optional[str] = args.timeout
+        self.namespace: Optional[str] = args.namespace
+        self.token: Optional[str] = args.token
+
+    def run(self) -> None:
+        api = HfApi(token=self.token)
+        scheduled_job = api.create_scheduled_job(
+            image=self.image,
+            command=self.command,
+            schedule=self.schedule,
+            suspend=self.suspend,
+            concurrency=self.concurrency,
+            env=self.env,
+            secrets=self.secrets,
+            flavor=self.flavor,
+            timeout=self.timeout,
+            namespace=self.namespace,
+        )
+        # Always print the scheduled job ID to the user
+        print(f"Scheduled Job created with ID: {scheduled_job.id}")
+
+
+class ScheduledPsCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("ps", help="List scheduled Jobs")
+        run_parser.add_argument(
+            "-a",
+            "--all",
+            action="store_true",
+            help="Show all scheduled Jobs (default hides suspended)",
+        )
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace from where it lists the jobs. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token",
+            type=str,
+            help="A User Access Token generated from https://huggingface.co/settings/tokens",
+        )
+        # Add Docker-style filtering argument
+        run_parser.add_argument(
+            "-f",
+            "--filter",
+            action="append",
+            default=[],
+            help="Filter output based on conditions provided (format: key=value)",
+        )
+        # Add option to format output
+        run_parser.add_argument(
+            "--format",
+            type=str,
+            help="Format output using a custom template",
+        )
+        run_parser.set_defaults(func=ScheduledPsCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.all: bool = args.all
+        self.namespace: Optional[str] = args.namespace
+        self.token: Optional[str] = args.token
+        self.format: Optional[str] = args.format
+        self.filters: Dict[str, str] = {}
+
+        # Parse filter arguments (key=value pairs)
+        for f in args.filter:
+            if "=" in f:
+                key, value = f.split("=", 1)
+                self.filters[key.lower()] = value
+            else:
+                print(f"Warning: Ignoring invalid filter format '{f}'. Use key=value format.")
+
+    def run(self) -> None:
+        """
+        Fetch and display scheduked job information for the current user.
+        Uses Docker-style filtering with -f/--filter flag and key=value pairs.
+        """
+        try:
+            api = HfApi(token=self.token)
+
+            # Fetch jobs data
+            scheduled_jobs = api.list_scheduled_jobs(namespace=self.namespace)
+
+            # Define table headers
+            table_headers = [
+                "ID",
+                "SCHEDULE",
+                "IMAGE/SPACE",
+                "COMMAND",
+                "LAST RUN",
+                "NEXT RUN",
+                "SUSPEND",
+            ]
+
+            # Process jobs data
+            rows = []
+
+            for scheduled_job in scheduled_jobs:
+                # Extract job data for filtering
+                suspend = scheduled_job.suspend
+
+                # Skip job if not all jobs should be shown and status doesn't match criteria
+                if not self.all and suspend:
+                    continue
+
+                # Extract job ID
+                scheduled_job_id = scheduled_job.id
+
+                # Extract schedule
+                schedule = scheduled_job.schedule
+
+                # Extract image or space information
+                image_or_space = scheduled_job.job_spec.docker_image or "N/A"
+
+                # Extract and format command
+                command = scheduled_job.job_spec.command or []
+                command_str = " ".join(command) if command else "N/A"
+
+                # Extract status
+                last_job_at = (
+                    scheduled_job.status.last_job.at.strftime("%Y-%m-%d %H:%M:%S")
+                    if scheduled_job.status.last_job
+                    else "N/A"
+                )
+                next_job_run_at = (
+                    scheduled_job.status.next_job_run_at.strftime("%Y-%m-%d %H:%M:%S")
+                    if scheduled_job.status.next_job_run_at
+                    else "N/A"
+                )
+
+                # Create a dict with all job properties for filtering
+                job_properties = {
+                    "id": scheduled_job_id,
+                    "image": image_or_space,
+                    "suspend": str(suspend),
+                    "command": command_str,
+                }
+
+                # Check if job matches all filters
+                if not self._matches_filters(job_properties):
+                    continue
+
+                # Create row
+                rows.append(
+                    [
+                        scheduled_job_id,
+                        schedule,
+                        image_or_space,
+                        command_str,
+                        last_job_at,
+                        next_job_run_at,
+                        suspend,
+                    ]
+                )
+
+            # Handle empty results
+            if not rows:
+                filters_msg = ""
+                if self.filters:
+                    filters_msg = f" matching filters: {', '.join([f'{k}={v}' for k, v in self.filters.items()])}"
+
+                print(f"No scheduled jobs found{filters_msg}")
+                return
+
+            # Apply custom format if provided or use default tabular format
+            self._print_output(rows, table_headers)
+
+        except requests.RequestException as e:
+            print(f"Error fetching scheduled jobs data: {e}")
+        except (KeyError, ValueError, TypeError) as e:
+            print(f"Error processing scheduled jobs data: {e}")
+        except Exception as e:
+            print(f"Unexpected error - {type(e).__name__}: {e}")
+
+    def _matches_filters(self, job_properties: Dict[str, str]) -> bool:
+        """Check if scheduled job matches all specified filters."""
+        for key, pattern in self.filters.items():
+            # Check if property exists
+            if key not in job_properties:
+                return False
+
+            # Support pattern matching with wildcards
+            if "*" in pattern or "?" in pattern:
+                # Convert glob pattern to regex
+                regex_pattern = pattern.replace("*", ".*").replace("?", ".")
+                if not re.search(f"^{regex_pattern}$", job_properties[key], re.IGNORECASE):
+                    return False
+            # Simple substring matching
+            elif pattern.lower() not in job_properties[key].lower():
+                return False
+
+        return True
+
+    def _print_output(self, rows, headers):
+        """Print output according to the chosen format."""
+        if self.format:
+            # Custom template formatting (simplified)
+            template = self.format
+            for row in rows:
+                line = template
+                for i, field in enumerate(
+                    ["id", "schedule", "image", "command", "last_job_at", "next_job_run_at", "suspend"]
+                ):
+                    placeholder = f"{{{{.{field}}}}}"
+                    if placeholder in line:
+                        line = line.replace(placeholder, str(row[i]))
+                print(line)
+        else:
+            # Default tabular format
+            print(
+                _tabulate(
+                    rows,
+                    headers=headers,
+                )
+            )
+
+
+class ScheduledInspectCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("inspect", help="Display detailed information on one or more scheduled Jobs")
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the scheduled job is. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        run_parser.add_argument("scheduled_job_ids", nargs="...", help="The scheduled jobs to inspect")
+        run_parser.set_defaults(func=ScheduledInspectCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.namespace: Optional[str] = args.namespace
+        self.token: Optional[str] = args.token
+        self.scheduled_job_ids: List[str] = args.scheduled_job_ids
+
+    def run(self) -> None:
+        api = HfApi(token=self.token)
+        scheduled_jobs = [
+            api.inspect_scheduled_job(scheduled_job_id=scheduled_job_id, namespace=self.namespace)
+            for scheduled_job_id in self.scheduled_job_ids
+        ]
+        print(json.dumps([asdict(scheduled_job) for scheduled_job in scheduled_jobs], indent=4, default=str))
+
+
+class ScheduledDeleteCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("delete", help="Delete a scheduled Job")
+        run_parser.add_argument("scheduled_job_id", type=str, help="Scheduled Job ID")
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the scheduled job is. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        run_parser.set_defaults(func=ScheduledDeleteCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.scheduled_job_id: str = args.scheduled_job_id
+        self.namespace = args.namespace
+        self.token: Optional[str] = args.token
+
+    def run(self) -> None:
+        api = HfApi(token=self.token)
+        api.delete_scheduled_job(scheduled_job_id=self.scheduled_job_id, namespace=self.namespace)
+
+
+class ScheduledSuspendCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("suspend", help="Suspend (pause) a scheduled Job")
+        run_parser.add_argument("scheduled_job_id", type=str, help="Scheduled Job ID")
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the scheduled job is. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        run_parser.set_defaults(func=ScheduledSuspendCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.scheduled_job_id: str = args.scheduled_job_id
+        self.namespace = args.namespace
+        self.token: Optional[str] = args.token
+
+    def run(self) -> None:
+        api = HfApi(token=self.token)
+        api.suspend_scheduled_job(scheduled_job_id=self.scheduled_job_id, namespace=self.namespace)
+
+
+class ScheduledResumeCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction) -> None:
+        run_parser = parser.add_parser("resume", help="Resume (unpause) a scheduled Job")
+        run_parser.add_argument("scheduled_job_id", type=str, help="Scheduled Job ID")
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the scheduled job is. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        run_parser.set_defaults(func=ScheduledResumeCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.scheduled_job_id: str = args.scheduled_job_id
+        self.namespace = args.namespace
+        self.token: Optional[str] = args.token
+
+    def run(self) -> None:
+        api = HfApi(token=self.token)
+        api.resume_scheduled_job(scheduled_job_id=self.scheduled_job_id, namespace=self.namespace)
+
+
+class ScheduledUvCommand(BaseHuggingfaceCLICommand):
+    """Schedule UV scripts on Hugging Face infrastructure."""
+
+    @staticmethod
+    def register_subcommand(parser):
+        """Register UV run subcommand."""
+        uv_parser = parser.add_parser(
+            "uv",
+            help="Schedule UV scripts (Python with inline dependencies) on HF infrastructure",
+        )
+
+        subparsers = uv_parser.add_subparsers(dest="uv_command", help="UV commands", required=True)
+
+        # Run command only
+        run_parser = subparsers.add_parser(
+            "run",
+            help="Run a UV script (local file or URL) on HF infrastructure",
+        )
+        run_parser.add_argument(
+            "schedule",
+            type=str,
+            help="One of annually, yearly, monthly, weekly, daily, hourly, or a CRON schedule expression.",
+        )
+        run_parser.add_argument("script", help="UV script to run (local file or URL)")
+        run_parser.add_argument("script_args", nargs="...", help="Arguments for the script", default=[])
+        run_parser.add_argument(
+            "--suspend",
+            action="store_true",
+            help="Suspend (pause) the scheduled Job",
+            default=None,
+        )
+        run_parser.add_argument(
+            "--concurrency",
+            action="store_true",
+            help="Allow multiple instances of this Job to run concurrently",
+            default=None,
+        )
+        run_parser.add_argument("--image", type=str, help="Use a custom Docker image with `uv` installed.")
+        run_parser.add_argument(
+            "--repo",
+            help="Repository name for the script (creates ephemeral if not specified)",
+        )
+        run_parser.add_argument(
+            "--flavor",
+            type=str,
+            help=f"Flavor for the hardware, as in HF Spaces. Defaults to `cpu-basic`. Possible values: {', '.join(SUGGESTED_FLAVORS)}.",
+        )
+        run_parser.add_argument("-e", "--env", action="append", help="Environment variables")
+        run_parser.add_argument(
+            "-s",
+            "--secrets",
+            action="append",
+            help=(
+                "Set secret environment variables. E.g. --secrets SECRET=value "
+                "or `--secrets HF_TOKEN` to pass your Hugging Face token."
+            ),
+        )
+        run_parser.add_argument("--env-file", type=str, help="Read in a file of environment variables.")
+        run_parser.add_argument(
+            "--secrets-file",
+            type=str,
+            help="Read in a file of secret environment variables.",
+        )
+        run_parser.add_argument("--timeout", type=str, help="Max duration (e.g., 30s, 5m, 1h)")
+        run_parser.add_argument("-d", "--detach", action="store_true", help="Run in background")
+        run_parser.add_argument(
+            "--namespace",
+            type=str,
+            help="The namespace where the Job will be created. Defaults to the current user's namespace.",
+        )
+        run_parser.add_argument("--token", type=str, help="HF token")
+        # UV options
+        run_parser.add_argument("--with", action="append", help="Run with the given packages installed", dest="with_")
+        run_parser.add_argument(
+            "-p", "--python", type=str, help="The Python interpreter to use for the run environment"
+        )
+        run_parser.set_defaults(func=ScheduledUvCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        """Initialize the command with parsed arguments."""
+        self.schedule: str = args.schedule
+        self.script = args.script
+        self.script_args = args.script_args
+        self.suspend: Optional[bool] = args.suspend
+        self.concurrency: Optional[bool] = args.concurrency
+        self.dependencies = args.with_
+        self.python = args.python
+        self.image = args.image
+        self.env: dict[str, Optional[str]] = {}
+        if args.env_file:
+            self.env.update(load_dotenv(Path(args.env_file).read_text(), environ=os.environ.copy()))
+        for env_value in args.env or []:
+            self.env.update(load_dotenv(env_value, environ=os.environ.copy()))
+        self.secrets: dict[str, Optional[str]] = {}
+        extended_environ = _get_extended_environ()
+        if args.secrets_file:
+            self.secrets.update(load_dotenv(Path(args.secrets_file).read_text(), environ=extended_environ))
+        for secret in args.secrets or []:
+            self.secrets.update(load_dotenv(secret, environ=extended_environ))
+        self.flavor: Optional[SpaceHardware] = args.flavor
+        self.timeout: Optional[str] = args.timeout
+        self.detach: bool = args.detach
+        self.namespace: Optional[str] = args.namespace
+        self.token: Optional[str] = args.token
+        self._repo = args.repo
+
+    def run(self) -> None:
+        """Schedule UV command."""
+        logging.set_verbosity(logging.INFO)
+        api = HfApi(token=self.token)
+        job = api.create_scheduled_uv_job(
+            script=self.script,
+            script_args=self.script_args,
+            schedule=self.schedule,
+            suspend=self.suspend,
+            concurrency=self.concurrency,
+            dependencies=self.dependencies,
+            python=self.python,
+            image=self.image,
+            env=self.env,
+            secrets=self.secrets,
+            flavor=self.flavor,
+            timeout=self.timeout,
+            namespace=self.namespace,
+            _repo=self._repo,
+        )
+
+        # Always print the job ID to the user
+        print(f"Scheduled Job created with ID: {job.id}")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/lfs.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/lfs.py
new file mode 100644
index 0000000000000000000000000000000000000000..e4c5b900c816494c260f6c440843a2d83703fab5
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/lfs.py
@@ -0,0 +1,198 @@
+"""
+Implementation of a custom transfer agent for the transfer type "multipart" for
+git-lfs.
+
+Inspired by:
+github.com/cbartz/git-lfs-swift-transfer-agent/blob/master/git_lfs_swift_transfer.py
+
+Spec is: github.com/git-lfs/git-lfs/blob/master/docs/custom-transfers.md
+
+
+To launch debugger while developing:
+
+``` [lfs "customtransfer.multipart"]
+path = /path/to/huggingface_hub/.env/bin/python args = -m debugpy --listen 5678
+--wait-for-client
+/path/to/huggingface_hub/src/huggingface_hub/commands/huggingface_cli.py
+lfs-multipart-upload ```"""
+
+import json
+import os
+import subprocess
+import sys
+from argparse import _SubParsersAction
+from typing import Dict, List, Optional
+
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.lfs import LFS_MULTIPART_UPLOAD_COMMAND
+
+from ..utils import get_session, hf_raise_for_status, logging
+from ..utils._lfs import SliceFileObj
+
+
+logger = logging.get_logger(__name__)
+
+
+class LfsCommands(BaseHuggingfaceCLICommand):
+    """
+    Implementation of a custom transfer agent for the transfer type "multipart"
+    for git-lfs. This lets users upload large files >5GB 🔥. Spec for LFS custom
+    transfer agent is:
+    https://github.com/git-lfs/git-lfs/blob/master/docs/custom-transfers.md
+
+    This introduces two commands to the CLI:
+
+    1. $ hf lfs-enable-largefiles
+
+    This should be executed once for each model repo that contains a model file
+    >5GB. It's documented in the error message you get if you just try to git
+    push a 5GB file without having enabled it before.
+
+    2. $ hf lfs-multipart-upload
+
+    This command is called by lfs directly and is not meant to be called by the
+    user.
+    """
+
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        enable_parser = parser.add_parser("lfs-enable-largefiles", add_help=False)
+        enable_parser.add_argument("path", type=str, help="Local path to repository you want to configure.")
+        enable_parser.set_defaults(func=lambda args: LfsEnableCommand(args))
+
+        # Command will get called by git-lfs, do not call it directly.
+        upload_parser = parser.add_parser(LFS_MULTIPART_UPLOAD_COMMAND, add_help=False)
+        upload_parser.set_defaults(func=lambda args: LfsUploadCommand(args))
+
+
+class LfsEnableCommand:
+    def __init__(self, args):
+        self.args = args
+
+    def run(self):
+        local_path = os.path.abspath(self.args.path)
+        if not os.path.isdir(local_path):
+            print("This does not look like a valid git repo.")
+            exit(1)
+        subprocess.run(
+            "git config lfs.customtransfer.multipart.path hf".split(),
+            check=True,
+            cwd=local_path,
+        )
+        subprocess.run(
+            f"git config lfs.customtransfer.multipart.args {LFS_MULTIPART_UPLOAD_COMMAND}".split(),
+            check=True,
+            cwd=local_path,
+        )
+        print("Local repo set up for largefiles")
+
+
+def write_msg(msg: Dict):
+    """Write out the message in Line delimited JSON."""
+    msg_str = json.dumps(msg) + "\n"
+    sys.stdout.write(msg_str)
+    sys.stdout.flush()
+
+
+def read_msg() -> Optional[Dict]:
+    """Read Line delimited JSON from stdin."""
+    msg = json.loads(sys.stdin.readline().strip())
+
+    if "terminate" in (msg.get("type"), msg.get("event")):
+        # terminate message received
+        return None
+
+    if msg.get("event") not in ("download", "upload"):
+        logger.critical("Received unexpected message")
+        sys.exit(1)
+
+    return msg
+
+
+class LfsUploadCommand:
+    def __init__(self, args) -> None:
+        self.args = args
+
+    def run(self) -> None:
+        # Immediately after invoking a custom transfer process, git-lfs
+        # sends initiation data to the process over stdin.
+        # This tells the process useful information about the configuration.
+        init_msg = json.loads(sys.stdin.readline().strip())
+        if not (init_msg.get("event") == "init" and init_msg.get("operation") == "upload"):
+            write_msg({"error": {"code": 32, "message": "Wrong lfs init operation"}})
+            sys.exit(1)
+
+        # The transfer process should use the information it needs from the
+        # initiation structure, and also perform any one-off setup tasks it
+        # needs to do. It should then respond on stdout with a simple empty
+        # confirmation structure, as follows:
+        write_msg({})
+
+        # After the initiation exchange, git-lfs will send any number of
+        # transfer requests to the stdin of the transfer process, in a serial sequence.
+        while True:
+            msg = read_msg()
+            if msg is None:
+                # When all transfers have been processed, git-lfs will send
+                # a terminate event to the stdin of the transfer process.
+                # On receiving this message the transfer process should
+                # clean up and terminate. No response is expected.
+                sys.exit(0)
+
+            oid = msg["oid"]
+            filepath = msg["path"]
+            completion_url = msg["action"]["href"]
+            header = msg["action"]["header"]
+            chunk_size = int(header.pop("chunk_size"))
+            presigned_urls: List[str] = list(header.values())
+
+            # Send a "started" progress event to allow other workers to start.
+            # Otherwise they're delayed until first "progress" event is reported,
+            # i.e. after the first 5GB by default (!)
+            write_msg(
+                {
+                    "event": "progress",
+                    "oid": oid,
+                    "bytesSoFar": 1,
+                    "bytesSinceLast": 0,
+                }
+            )
+
+            parts = []
+            with open(filepath, "rb") as file:
+                for i, presigned_url in enumerate(presigned_urls):
+                    with SliceFileObj(
+                        file,
+                        seek_from=i * chunk_size,
+                        read_limit=chunk_size,
+                    ) as data:
+                        r = get_session().put(presigned_url, data=data)
+                        hf_raise_for_status(r)
+                        parts.append(
+                            {
+                                "etag": r.headers.get("etag"),
+                                "partNumber": i + 1,
+                            }
+                        )
+                        # In order to support progress reporting while data is uploading / downloading,
+                        # the transfer process should post messages to stdout
+                        write_msg(
+                            {
+                                "event": "progress",
+                                "oid": oid,
+                                "bytesSoFar": (i + 1) * chunk_size,
+                                "bytesSinceLast": chunk_size,
+                            }
+                        )
+                        # Not precise but that's ok.
+
+            r = get_session().post(
+                completion_url,
+                json={
+                    "oid": oid,
+                    "parts": parts,
+                },
+            )
+            hf_raise_for_status(r)
+
+            write_msg({"event": "complete", "oid": oid})
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/repo.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/repo.py
new file mode 100644
index 0000000000000000000000000000000000000000..ef0e3313580e3753a2617745e15762933229b15f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/repo.py
@@ -0,0 +1,249 @@
+# Copyright 2025 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains commands to interact with repositories on the Hugging Face Hub.
+
+Usage:
+    # create a new dataset repo on the Hub
+    hf repo create my-cool-dataset --repo-type=dataset
+
+    # create a private model repo on the Hub
+    hf repo create my-cool-model --private
+"""
+
+import argparse
+from argparse import _SubParsersAction
+from typing import Optional
+
+from requests.exceptions import HTTPError
+
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.commands._cli_utils import ANSI
+from huggingface_hub.constants import REPO_TYPES, SPACES_SDK_TYPES
+from huggingface_hub.errors import HfHubHTTPError, RepositoryNotFoundError, RevisionNotFoundError
+from huggingface_hub.hf_api import HfApi
+from huggingface_hub.utils import logging
+
+
+logger = logging.get_logger(__name__)
+
+
+class RepoCommands(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        repo_parser = parser.add_parser("repo", help="Manage repos on the Hub.")
+        repo_subparsers = repo_parser.add_subparsers(help="huggingface.co repos related commands")
+
+        # Show help if no subcommand is provided
+        repo_parser.set_defaults(func=lambda args: repo_parser.print_help())
+
+        # CREATE
+        repo_create_parser = repo_subparsers.add_parser("create", help="Create a new repo on huggingface.co")
+        repo_create_parser.add_argument(
+            "repo_id",
+            type=str,
+            help="The ID of the repo to create to (e.g. `username/repo-name`). The username is optional and will be set to your username if not provided.",
+        )
+        repo_create_parser.add_argument(
+            "--repo-type",
+            type=str,
+            help='Optional: set to "dataset" or "space" if creating a dataset or space, default is model.',
+        )
+        repo_create_parser.add_argument(
+            "--space_sdk",
+            type=str,
+            help='Optional: Hugging Face Spaces SDK type. Required when --type is set to "space".',
+            choices=SPACES_SDK_TYPES,
+        )
+        repo_create_parser.add_argument(
+            "--private",
+            action="store_true",
+            help="Whether to create a private repository. Defaults to public unless the organization's default is private.",
+        )
+        repo_create_parser.add_argument(
+            "--token",
+            type=str,
+            help="Hugging Face token. Will default to the locally saved token if not provided.",
+        )
+        repo_create_parser.add_argument(
+            "--exist-ok",
+            action="store_true",
+            help="Do not raise an error if repo already exists.",
+        )
+        repo_create_parser.add_argument(
+            "--resource-group-id",
+            type=str,
+            help="Resource group in which to create the repo. Resource groups is only available for Enterprise Hub organizations.",
+        )
+        repo_create_parser.set_defaults(func=lambda args: RepoCreateCommand(args))
+
+        # TAG SUBCOMMANDS
+        repo_tag_parser = repo_subparsers.add_parser("tag", help="Manage tags for a repo on the Hub.")
+        tag_subparsers = repo_tag_parser.add_subparsers(help="Tag actions", dest="tag_action", required=True)
+
+        # tag create
+        tag_create_parser = tag_subparsers.add_parser("create", help="Create a tag for a repo.")
+        tag_create_parser.add_argument(
+            "repo_id", type=str, help="The ID of the repo to tag (e.g. `username/repo-name`)."
+        )
+        tag_create_parser.add_argument("tag", type=str, help="The name of the tag to create.")
+        tag_create_parser.add_argument("-m", "--message", type=str, help="The description of the tag to create.")
+        tag_create_parser.add_argument("--revision", type=str, help="The git revision to tag.")
+        tag_create_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens."
+        )
+        tag_create_parser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            default="model",
+            help="Set the type of repository (model, dataset, or space).",
+        )
+        tag_create_parser.set_defaults(func=lambda args: RepoTagCreateCommand(args))
+
+        # tag list
+        tag_list_parser = tag_subparsers.add_parser("list", help="List tags for a repo.")
+        tag_list_parser.add_argument(
+            "repo_id", type=str, help="The ID of the repo to list tags for (e.g. `username/repo-name`)."
+        )
+        tag_list_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens."
+        )
+        tag_list_parser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            default="model",
+            help="Set the type of repository (model, dataset, or space).",
+        )
+        tag_list_parser.set_defaults(func=lambda args: RepoTagListCommand(args))
+
+        # tag delete
+        tag_delete_parser = tag_subparsers.add_parser("delete", help="Delete a tag from a repo.")
+        tag_delete_parser.add_argument(
+            "repo_id", type=str, help="The ID of the repo to delete the tag from (e.g. `username/repo-name`)."
+        )
+        tag_delete_parser.add_argument("tag", type=str, help="The name of the tag to delete.")
+        tag_delete_parser.add_argument("-y", "--yes", action="store_true", help="Answer Yes to prompts automatically.")
+        tag_delete_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens."
+        )
+        tag_delete_parser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            default="model",
+            help="Set the type of repository (model, dataset, or space).",
+        )
+        tag_delete_parser.set_defaults(func=lambda args: RepoTagDeleteCommand(args))
+
+
+class RepoCreateCommand:
+    def __init__(self, args: argparse.Namespace):
+        self.repo_id: str = args.repo_id
+        self.repo_type: Optional[str] = args.repo_type
+        self.space_sdk: Optional[str] = args.space_sdk
+        self.private: bool = args.private
+        self.token: Optional[str] = args.token
+        self.exist_ok: bool = args.exist_ok
+        self.resource_group_id: Optional[str] = args.resource_group_id
+        self._api = HfApi()
+
+    def run(self):
+        repo_url = self._api.create_repo(
+            repo_id=self.repo_id,
+            repo_type=self.repo_type,
+            private=self.private,
+            token=self.token,
+            exist_ok=self.exist_ok,
+            resource_group_id=self.resource_group_id,
+            space_sdk=self.space_sdk,
+        )
+        print(f"Successfully created {ANSI.bold(repo_url.repo_id)} on the Hub.")
+        print(f"Your repo is now available at {ANSI.bold(repo_url)}")
+
+
+class RepoTagCommand:
+    def __init__(self, args):
+        self.args = args
+        self.api = HfApi(token=getattr(args, "token", None))
+        self.repo_id = args.repo_id
+        self.repo_type = getattr(args, "repo_type", "model")
+        if self.repo_type not in REPO_TYPES:
+            print("Invalid repo --repo-type")
+            exit(1)
+
+
+class RepoTagCreateCommand(RepoTagCommand):
+    def run(self):
+        print(
+            f"You are about to create tag {ANSI.bold(str(self.args.tag))} on {self.repo_type} {ANSI.bold(self.repo_id)}"
+        )
+        try:
+            self.api.create_tag(
+                repo_id=self.repo_id,
+                tag=self.args.tag,
+                tag_message=getattr(self.args, "message", None),
+                revision=getattr(self.args, "revision", None),
+                repo_type=self.repo_type,
+            )
+        except RepositoryNotFoundError:
+            print(f"{self.repo_type.capitalize()} {ANSI.bold(self.repo_id)} not found.")
+            exit(1)
+        except RevisionNotFoundError:
+            print(f"Revision {ANSI.bold(str(getattr(self.args, 'revision', None)))} not found.")
+            exit(1)
+        except HfHubHTTPError as e:
+            if e.response.status_code == 409:
+                print(f"Tag {ANSI.bold(str(self.args.tag))} already exists on {ANSI.bold(self.repo_id)}")
+                exit(1)
+            raise e
+        print(f"Tag {ANSI.bold(str(self.args.tag))} created on {ANSI.bold(self.repo_id)}")
+
+
+class RepoTagListCommand(RepoTagCommand):
+    def run(self):
+        try:
+            refs = self.api.list_repo_refs(
+                repo_id=self.repo_id,
+                repo_type=self.repo_type,
+            )
+        except RepositoryNotFoundError:
+            print(f"{self.repo_type.capitalize()} {ANSI.bold(self.repo_id)} not found.")
+            exit(1)
+        except HTTPError as e:
+            print(e)
+            print(ANSI.red(e.response.text))
+            exit(1)
+        if len(refs.tags) == 0:
+            print("No tags found")
+            exit(0)
+        print(f"Tags for {self.repo_type} {ANSI.bold(self.repo_id)}:")
+        for tag in refs.tags:
+            print(tag.name)
+
+
+class RepoTagDeleteCommand(RepoTagCommand):
+    def run(self):
+        print(f"You are about to delete tag {ANSI.bold(self.args.tag)} on {self.repo_type} {ANSI.bold(self.repo_id)}")
+        if not getattr(self.args, "yes", False):
+            choice = input("Proceed? [Y/n] ").lower()
+            if choice not in ("", "y", "yes"):
+                print("Abort")
+                exit()
+        try:
+            self.api.delete_tag(repo_id=self.repo_id, tag=self.args.tag, repo_type=self.repo_type)
+        except RepositoryNotFoundError:
+            print(f"{self.repo_type.capitalize()} {ANSI.bold(self.repo_id)} not found.")
+            exit(1)
+        except RevisionNotFoundError:
+            print(f"Tag {ANSI.bold(self.args.tag)} not found on {ANSI.bold(self.repo_id)}")
+            exit(1)
+        print(f"Tag {ANSI.bold(self.args.tag)} deleted on {ANSI.bold(self.repo_id)}")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/repo_files.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/repo_files.py
new file mode 100644
index 0000000000000000000000000000000000000000..403d3126e234c7561cde5fbf8f1d49d7e3271da8
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/repo_files.py
@@ -0,0 +1,128 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to update or delete files in a repository using the CLI.
+
+Usage:
+    # delete all
+    hf repo-files delete <repo_id> "*"
+
+    # delete single file
+    hf repo-files delete <repo_id> file.txt
+
+    # delete single folder
+    hf repo-files delete <repo_id> folder/
+
+    # delete multiple
+    hf repo-files delete <repo_id> file.txt folder/ file2.txt
+
+    # delete multiple patterns
+    hf repo-files delete <repo_id> file.txt "*.json" "folder/*.parquet"
+
+    # delete from different revision / repo-type
+    hf repo-files delete <repo_id> file.txt --revision=refs/pr/1 --repo-type=dataset
+"""
+
+from argparse import _SubParsersAction
+from typing import List, Optional
+
+from huggingface_hub import logging
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.hf_api import HfApi
+
+
+logger = logging.get_logger(__name__)
+
+
+class DeleteFilesSubCommand:
+    def __init__(self, args) -> None:
+        self.args = args
+        self.repo_id: str = args.repo_id
+        self.repo_type: Optional[str] = args.repo_type
+        self.revision: Optional[str] = args.revision
+        self.api: HfApi = HfApi(token=args.token, library_name="huggingface-cli")
+        self.patterns: List[str] = args.patterns
+        self.commit_message: Optional[str] = args.commit_message
+        self.commit_description: Optional[str] = args.commit_description
+        self.create_pr: bool = args.create_pr
+        self.token: Optional[str] = args.token
+
+    def run(self) -> None:
+        logging.set_verbosity_info()
+        url = self.api.delete_files(
+            delete_patterns=self.patterns,
+            repo_id=self.repo_id,
+            repo_type=self.repo_type,
+            revision=self.revision,
+            commit_message=self.commit_message,
+            commit_description=self.commit_description,
+            create_pr=self.create_pr,
+        )
+        print(f"Files correctly deleted from repo. Commit: {url}.")
+        logging.set_verbosity_warning()
+
+
+class RepoFilesCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        repo_files_parser = parser.add_parser("repo-files", help="Manage files in a repo on the Hub.")
+        repo_files_subparsers = repo_files_parser.add_subparsers(
+            help="Action to execute against the files.",
+            required=True,
+        )
+        delete_subparser = repo_files_subparsers.add_parser(
+            "delete",
+            help="Delete files from a repo on the Hub",
+        )
+        delete_subparser.set_defaults(func=lambda args: DeleteFilesSubCommand(args))
+        delete_subparser.add_argument(
+            "repo_id", type=str, help="The ID of the repo to manage (e.g. `username/repo-name`)."
+        )
+        delete_subparser.add_argument(
+            "patterns",
+            nargs="+",
+            type=str,
+            help="Glob patterns to match files to delete.",
+        )
+        delete_subparser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            default="model",
+            help="Type of the repo to upload to (e.g. `dataset`).",
+        )
+        delete_subparser.add_argument(
+            "--revision",
+            type=str,
+            help=(
+                "An optional Git revision to push to. It can be a branch name "
+                "or a PR reference. If revision does not"
+                " exist and `--create-pr` is not set, a branch will be automatically created."
+            ),
+        )
+        delete_subparser.add_argument(
+            "--commit-message", type=str, help="The summary / title / first line of the generated commit."
+        )
+        delete_subparser.add_argument(
+            "--commit-description", type=str, help="The description of the generated commit."
+        )
+        delete_subparser.add_argument(
+            "--create-pr", action="store_true", help="Whether to create a new Pull Request for these changes."
+        )
+        delete_subparser.add_argument(
+            "--token",
+            type=str,
+            help="A User Access Token generated from https://huggingface.co/settings/tokens",
+        )
+
+        repo_files_parser.set_defaults(func=RepoFilesCommand)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/system.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/system.py
new file mode 100644
index 0000000000000000000000000000000000000000..03650175e9b71e329755de5c86e5bbf50569d4b7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/system.py
@@ -0,0 +1,52 @@
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains commands to print information about the environment and version.
+
+Usage:
+    hf env
+    hf version
+"""
+
+from argparse import _SubParsersAction
+
+from huggingface_hub import __version__
+
+from ..utils import dump_environment_info
+from . import BaseHuggingfaceCLICommand
+
+
+class EnvironmentCommand(BaseHuggingfaceCLICommand):
+    def __init__(self, args):
+        self.args = args
+
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        env_parser = parser.add_parser("env", help="Print information about the environment.")
+        env_parser.set_defaults(func=EnvironmentCommand)
+
+    def run(self) -> None:
+        dump_environment_info()
+
+
+class VersionCommand(BaseHuggingfaceCLICommand):
+    def __init__(self, args):
+        self.args = args
+
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        version_parser = parser.add_parser("version", help="Print information about the hf version.")
+        version_parser.set_defaults(func=VersionCommand)
+
+    def run(self) -> None:
+        print(f"huggingface_hub version: {__version__}")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/upload.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/upload.py
new file mode 100644
index 0000000000000000000000000000000000000000..0306bf9f5715fdc180dc4fa9819852388fca8b99
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/upload.py
@@ -0,0 +1,316 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to upload a repo or file with the CLI.
+
+Usage:
+    # Upload file (implicit)
+    hf upload my-cool-model ./my-cool-model.safetensors
+
+    # Upload file (explicit)
+    hf upload my-cool-model ./my-cool-model.safetensors  model.safetensors
+
+    # Upload directory (implicit). If `my-cool-model/` is a directory it will be uploaded, otherwise an exception is raised.
+    hf upload my-cool-model
+
+    # Upload directory (explicit)
+    hf upload my-cool-model ./models/my-cool-model .
+
+    # Upload filtered directory (example: tensorboard logs except for the last run)
+    hf upload my-cool-model ./model/training /logs --include "*.tfevents.*" --exclude "*20230905*"
+
+    # Upload with wildcard
+    hf upload my-cool-model "./model/training/*.safetensors"
+
+    # Upload private dataset
+    hf upload Wauplin/my-cool-dataset ./data . --repo-type=dataset --private
+
+    # Upload with token
+    hf upload Wauplin/my-cool-model --token=hf_****
+
+    # Sync local Space with Hub (upload new files, delete removed files)
+    hf upload Wauplin/space-example --repo-type=space --exclude="/logs/*" --delete="*" --commit-message="Sync local Space with Hub"
+
+    # Schedule commits every 30 minutes
+    hf upload Wauplin/my-cool-model --every=30
+"""
+
+import os
+import time
+import warnings
+from argparse import Namespace, _SubParsersAction
+from typing import List, Optional
+
+from huggingface_hub import logging
+from huggingface_hub._commit_scheduler import CommitScheduler
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.constants import HF_HUB_ENABLE_HF_TRANSFER
+from huggingface_hub.errors import RevisionNotFoundError
+from huggingface_hub.hf_api import HfApi
+from huggingface_hub.utils import disable_progress_bars, enable_progress_bars
+from huggingface_hub.utils._runtime import is_xet_available
+
+
+logger = logging.get_logger(__name__)
+
+
+class UploadCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        upload_parser = parser.add_parser(
+            "upload", help="Upload a file or a folder to the Hub. Recommended for single-commit uploads."
+        )
+        upload_parser.add_argument(
+            "repo_id", type=str, help="The ID of the repo to upload to (e.g. `username/repo-name`)."
+        )
+        upload_parser.add_argument(
+            "local_path",
+            nargs="?",
+            help="Local path to the file or folder to upload. Wildcard patterns are supported. Defaults to current directory.",
+        )
+        upload_parser.add_argument(
+            "path_in_repo",
+            nargs="?",
+            help="Path of the file or folder in the repo. Defaults to the relative path of the file or folder.",
+        )
+        upload_parser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            default="model",
+            help="Type of the repo to upload to (e.g. `dataset`).",
+        )
+        upload_parser.add_argument(
+            "--revision",
+            type=str,
+            help=(
+                "An optional Git revision to push to. It can be a branch name or a PR reference. If revision does not"
+                " exist and `--create-pr` is not set, a branch will be automatically created."
+            ),
+        )
+        upload_parser.add_argument(
+            "--private",
+            action="store_true",
+            help=(
+                "Whether to create a private repo if repo doesn't exist on the Hub. Ignored if the repo already"
+                " exists."
+            ),
+        )
+        upload_parser.add_argument("--include", nargs="*", type=str, help="Glob patterns to match files to upload.")
+        upload_parser.add_argument(
+            "--exclude", nargs="*", type=str, help="Glob patterns to exclude from files to upload."
+        )
+        upload_parser.add_argument(
+            "--delete",
+            nargs="*",
+            type=str,
+            help="Glob patterns for file to be deleted from the repo while committing.",
+        )
+        upload_parser.add_argument(
+            "--commit-message", type=str, help="The summary / title / first line of the generated commit."
+        )
+        upload_parser.add_argument("--commit-description", type=str, help="The description of the generated commit.")
+        upload_parser.add_argument(
+            "--create-pr", action="store_true", help="Whether to upload content as a new Pull Request."
+        )
+        upload_parser.add_argument(
+            "--every",
+            type=float,
+            help="If set, a background job is scheduled to create commits every `every` minutes.",
+        )
+        upload_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        upload_parser.add_argument(
+            "--quiet",
+            action="store_true",
+            help="If True, progress bars are disabled and only the path to the uploaded files is printed.",
+        )
+        upload_parser.set_defaults(func=UploadCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.repo_id: str = args.repo_id
+        self.repo_type: Optional[str] = args.repo_type
+        self.revision: Optional[str] = args.revision
+        self.private: bool = args.private
+
+        self.include: Optional[List[str]] = args.include
+        self.exclude: Optional[List[str]] = args.exclude
+        self.delete: Optional[List[str]] = args.delete
+
+        self.commit_message: Optional[str] = args.commit_message
+        self.commit_description: Optional[str] = args.commit_description
+        self.create_pr: bool = args.create_pr
+        self.api: HfApi = HfApi(token=args.token, library_name="huggingface-cli")
+        self.quiet: bool = args.quiet  # disable warnings and progress bars
+
+        # Check `--every` is valid
+        if args.every is not None and args.every <= 0:
+            raise ValueError(f"`every` must be a positive value (got '{args.every}')")
+        self.every: Optional[float] = args.every
+
+        # Resolve `local_path` and `path_in_repo`
+        repo_name: str = args.repo_id.split("/")[-1]  # e.g. "Wauplin/my-cool-model" => "my-cool-model"
+        self.local_path: str
+        self.path_in_repo: str
+
+        if args.local_path is not None and any(c in args.local_path for c in ["*", "?", "["]):
+            if args.include is not None:
+                raise ValueError("Cannot set `--include` when passing a `local_path` containing a wildcard.")
+            if args.path_in_repo is not None and args.path_in_repo != ".":
+                raise ValueError("Cannot set `path_in_repo` when passing a `local_path` containing a wildcard.")
+            self.local_path = "."
+            self.include = args.local_path
+            self.path_in_repo = "."
+        elif args.local_path is None and os.path.isfile(repo_name):
+            # Implicit case 1: user provided only a repo_id which happen to be a local file as well => upload it with same name
+            self.local_path = repo_name
+            self.path_in_repo = repo_name
+        elif args.local_path is None and os.path.isdir(repo_name):
+            # Implicit case 2: user provided only a repo_id which happen to be a local folder as well => upload it at root
+            self.local_path = repo_name
+            self.path_in_repo = "."
+        elif args.local_path is None:
+            # Implicit case 3: user provided only a repo_id that does not match a local file or folder
+            # => the user must explicitly provide a local_path => raise exception
+            raise ValueError(f"'{repo_name}' is not a local file or folder. Please set `local_path` explicitly.")
+        elif args.path_in_repo is None and os.path.isfile(args.local_path):
+            # Explicit local path to file, no path in repo => upload it at root with same name
+            self.local_path = args.local_path
+            self.path_in_repo = os.path.basename(args.local_path)
+        elif args.path_in_repo is None:
+            # Explicit local path to folder, no path in repo => upload at root
+            self.local_path = args.local_path
+            self.path_in_repo = "."
+        else:
+            # Finally, if both paths are explicit
+            self.local_path = args.local_path
+            self.path_in_repo = args.path_in_repo
+
+    def run(self) -> None:
+        if self.quiet:
+            disable_progress_bars()
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore")
+                print(self._upload())
+            enable_progress_bars()
+        else:
+            logging.set_verbosity_info()
+            print(self._upload())
+            logging.set_verbosity_warning()
+
+    def _upload(self) -> str:
+        if os.path.isfile(self.local_path):
+            if self.include is not None and len(self.include) > 0:
+                warnings.warn("Ignoring `--include` since a single file is uploaded.")
+            if self.exclude is not None and len(self.exclude) > 0:
+                warnings.warn("Ignoring `--exclude` since a single file is uploaded.")
+            if self.delete is not None and len(self.delete) > 0:
+                warnings.warn("Ignoring `--delete` since a single file is uploaded.")
+
+        if not is_xet_available() and not HF_HUB_ENABLE_HF_TRANSFER:
+            logger.info(
+                "Consider using `hf_transfer` for faster uploads. This solution comes with some limitations. See"
+                " https://huggingface.co/docs/huggingface_hub/hf_transfer for more details."
+            )
+
+        # Schedule commits if `every` is set
+        if self.every is not None:
+            if os.path.isfile(self.local_path):
+                # If file => watch entire folder + use allow_patterns
+                folder_path = os.path.dirname(self.local_path)
+                path_in_repo = (
+                    self.path_in_repo[: -len(self.local_path)]  # remove filename from path_in_repo
+                    if self.path_in_repo.endswith(self.local_path)
+                    else self.path_in_repo
+                )
+                allow_patterns = [self.local_path]
+                ignore_patterns = []
+            else:
+                folder_path = self.local_path
+                path_in_repo = self.path_in_repo
+                allow_patterns = self.include or []
+                ignore_patterns = self.exclude or []
+                if self.delete is not None and len(self.delete) > 0:
+                    warnings.warn("Ignoring `--delete` when uploading with scheduled commits.")
+
+            scheduler = CommitScheduler(
+                folder_path=folder_path,
+                repo_id=self.repo_id,
+                repo_type=self.repo_type,
+                revision=self.revision,
+                allow_patterns=allow_patterns,
+                ignore_patterns=ignore_patterns,
+                path_in_repo=path_in_repo,
+                private=self.private,
+                every=self.every,
+                hf_api=self.api,
+            )
+            print(f"Scheduling commits every {self.every} minutes to {scheduler.repo_id}.")
+            try:  # Block main thread until KeyboardInterrupt
+                while True:
+                    time.sleep(100)
+            except KeyboardInterrupt:
+                scheduler.stop()
+                return "Stopped scheduled commits."
+
+        # Otherwise, create repo and proceed with the upload
+        if not os.path.isfile(self.local_path) and not os.path.isdir(self.local_path):
+            raise FileNotFoundError(f"No such file or directory: '{self.local_path}'.")
+        repo_id = self.api.create_repo(
+            repo_id=self.repo_id,
+            repo_type=self.repo_type,
+            exist_ok=True,
+            private=self.private,
+            space_sdk="gradio" if self.repo_type == "space" else None,
+            # ^ We don't want it to fail when uploading to a Space => let's set Gradio by default.
+            # ^ I'd rather not add CLI args to set it explicitly as we already have `hf repo create` for that.
+        ).repo_id
+
+        # Check if branch already exists and if not, create it
+        if self.revision is not None and not self.create_pr:
+            try:
+                self.api.repo_info(repo_id=repo_id, repo_type=self.repo_type, revision=self.revision)
+            except RevisionNotFoundError:
+                logger.info(f"Branch '{self.revision}' not found. Creating it...")
+                self.api.create_branch(repo_id=repo_id, repo_type=self.repo_type, branch=self.revision, exist_ok=True)
+                # ^ `exist_ok=True` to avoid race concurrency issues
+
+        # File-based upload
+        if os.path.isfile(self.local_path):
+            return self.api.upload_file(
+                path_or_fileobj=self.local_path,
+                path_in_repo=self.path_in_repo,
+                repo_id=repo_id,
+                repo_type=self.repo_type,
+                revision=self.revision,
+                commit_message=self.commit_message,
+                commit_description=self.commit_description,
+                create_pr=self.create_pr,
+            )
+
+        # Folder-based upload
+        else:
+            return self.api.upload_folder(
+                folder_path=self.local_path,
+                path_in_repo=self.path_in_repo,
+                repo_id=repo_id,
+                repo_type=self.repo_type,
+                revision=self.revision,
+                commit_message=self.commit_message,
+                commit_description=self.commit_description,
+                create_pr=self.create_pr,
+                allow_patterns=self.include,
+                ignore_patterns=self.exclude,
+                delete_patterns=self.delete,
+            )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/upload_large_folder.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/upload_large_folder.py
new file mode 100644
index 0000000000000000000000000000000000000000..675c9ffe3dcd70242a9acd7837c6c2f00d8836df
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/cli/upload_large_folder.py
@@ -0,0 +1,132 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to upload a large folder with the CLI."""
+
+import os
+from argparse import Namespace, _SubParsersAction
+from typing import List, Optional
+
+from huggingface_hub import logging
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.hf_api import HfApi
+from huggingface_hub.utils import disable_progress_bars
+
+from ._cli_utils import ANSI
+
+
+logger = logging.get_logger(__name__)
+
+
+class UploadLargeFolderCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        subparser = parser.add_parser(
+            "upload-large-folder",
+            help="Upload a large folder to the Hub. Recommended for resumable uploads.",
+        )
+        subparser.add_argument(
+            "repo_id", type=str, help="The ID of the repo to upload to (e.g. `username/repo-name`)."
+        )
+        subparser.add_argument("local_path", type=str, help="Local path to the file or folder to upload.")
+        subparser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            help="Type of the repo to upload to (e.g. `dataset`).",
+        )
+        subparser.add_argument(
+            "--revision",
+            type=str,
+            help=("An optional Git revision to push to. It can be a branch name or a PR reference."),
+        )
+        subparser.add_argument(
+            "--private",
+            action="store_true",
+            help=(
+                "Whether to create a private repo if repo doesn't exist on the Hub. Ignored if the repo already exists."
+            ),
+        )
+        subparser.add_argument("--include", nargs="*", type=str, help="Glob patterns to match files to upload.")
+        subparser.add_argument("--exclude", nargs="*", type=str, help="Glob patterns to exclude from files to upload.")
+        subparser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        subparser.add_argument(
+            "--num-workers", type=int, help="Number of workers to use to hash, upload and commit files."
+        )
+        subparser.add_argument("--no-report", action="store_true", help="Whether to disable regular status report.")
+        subparser.add_argument("--no-bars", action="store_true", help="Whether to disable progress bars.")
+        subparser.set_defaults(func=UploadLargeFolderCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.repo_id: str = args.repo_id
+        self.local_path: str = args.local_path
+        self.repo_type: str = args.repo_type
+        self.revision: Optional[str] = args.revision
+        self.private: bool = args.private
+
+        self.include: Optional[List[str]] = args.include
+        self.exclude: Optional[List[str]] = args.exclude
+
+        self.api: HfApi = HfApi(token=args.token, library_name="huggingface-cli")
+
+        self.num_workers: Optional[int] = args.num_workers
+        self.no_report: bool = args.no_report
+        self.no_bars: bool = args.no_bars
+
+        if not os.path.isdir(self.local_path):
+            raise ValueError("Large upload is only supported for folders.")
+
+    def run(self) -> None:
+        logging.set_verbosity_info()
+
+        print(
+            ANSI.yellow(
+                "You are about to upload a large folder to the Hub using `hf upload-large-folder`. "
+                "This is a new feature so feedback is very welcome!\n"
+                "\n"
+                "A few things to keep in mind:\n"
+                "  - Repository limits still apply: https://huggingface.co/docs/hub/repositories-recommendations\n"
+                "  - Do not start several processes in parallel.\n"
+                "  - You can interrupt and resume the process at any time. "
+                "The script will pick up where it left off except for partially uploaded files that would have to be entirely reuploaded.\n"
+                "  - Do not upload the same folder to several repositories. If you need to do so, you must delete the `./.cache/huggingface/` folder first.\n"
+                "\n"
+                f"Some temporary metadata will be stored under `{self.local_path}/.cache/huggingface`.\n"
+                "  - You must not modify those files manually.\n"
+                "  - You must not delete the `./.cache/huggingface/` folder while a process is running.\n"
+                "  - You can delete the `./.cache/huggingface/` folder to reinitialize the upload state when process is not running. Files will have to be hashed and preuploaded again, except for already committed files.\n"
+                "\n"
+                "If the process output is too verbose, you can disable the progress bars with `--no-bars`. "
+                "You can also entirely disable the status report with `--no-report`.\n"
+                "\n"
+                "For more details, run `hf upload-large-folder --help` or check the documentation at "
+                "https://huggingface.co/docs/huggingface_hub/guides/upload#upload-a-large-folder."
+            )
+        )
+
+        if self.no_bars:
+            disable_progress_bars()
+
+        self.api.upload_large_folder(
+            repo_id=self.repo_id,
+            folder_path=self.local_path,
+            repo_type=self.repo_type,
+            revision=self.revision,
+            private=self.private,
+            allow_patterns=self.include,
+            ignore_patterns=self.exclude,
+            num_workers=self.num_workers,
+            print_report=not self.no_report,
+        )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..49d088214505b9604964ab142e7f8a5b38ccd5ef
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__init__.py
@@ -0,0 +1,27 @@
+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from abc import ABC, abstractmethod
+from argparse import _SubParsersAction
+
+
+class BaseHuggingfaceCLICommand(ABC):
+    @staticmethod
+    @abstractmethod
+    def register_subcommand(parser: _SubParsersAction):
+        raise NotImplementedError()
+
+    @abstractmethod
+    def run(self):
+        raise NotImplementedError()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..fb791c40d53fbdae4e35ccd2c23d5b1180079cc7
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/_cli_utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/_cli_utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..aff945c4f7b5563250db5082b58d170559588024
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/_cli_utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/delete_cache.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/delete_cache.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..207f23f9614ddd49a4353a81e0815f27aeb627de
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/delete_cache.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/download.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/download.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..4b18a34f25079c7e7476f704e2b51e32967fa6a5
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/download.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/env.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/env.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5f03abe00936dd370d0bf7672877b550f7554167
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/env.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/huggingface_cli.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/huggingface_cli.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..0485ffbca4431d557341da14251dbc8461050963
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/huggingface_cli.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/lfs.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/lfs.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..4071e7fe425ff788b07d48498598c03208a32553
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/lfs.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/repo.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/repo.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a8ab0f54be694c995c1680003550248566275957
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/repo.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/repo_files.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/repo_files.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..9431076020bcaed2e4d12e654003a1312c65cf68
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/repo_files.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/scan_cache.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/scan_cache.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3b3ca60883b80e1ee641aae2d764e395b217c237
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/scan_cache.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/tag.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/tag.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e199f691c81899ed0782386319c4dd85fcfd2707
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/tag.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/upload.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/upload.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..238437b5698064c2c6a7d2ab2f8ccd5182028a76
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/upload.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/upload_large_folder.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/upload_large_folder.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6ec6f46bbb0a408ddb1de62ce25c5108ccbd2761
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/upload_large_folder.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/user.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/user.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..514daa191042640353f5f5a08d4045d1ed32c83e
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/user.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/version.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/version.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..712bd49a101515489f14975a41d86a7066b46c98
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/__pycache__/version.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/_cli_utils.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/_cli_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..bf4a1c0373b4d4bb71a3f4e8ea39da5a01cc79a7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/_cli_utils.py
@@ -0,0 +1,74 @@
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains a utility for good-looking prints."""
+
+import os
+from typing import List, Union
+
+
+class ANSI:
+    """
+    Helper for en.wikipedia.org/wiki/ANSI_escape_code
+    """
+
+    _bold = "\u001b[1m"
+    _gray = "\u001b[90m"
+    _red = "\u001b[31m"
+    _reset = "\u001b[0m"
+    _yellow = "\u001b[33m"
+
+    @classmethod
+    def bold(cls, s: str) -> str:
+        return cls._format(s, cls._bold)
+
+    @classmethod
+    def gray(cls, s: str) -> str:
+        return cls._format(s, cls._gray)
+
+    @classmethod
+    def red(cls, s: str) -> str:
+        return cls._format(s, cls._bold + cls._red)
+
+    @classmethod
+    def yellow(cls, s: str) -> str:
+        return cls._format(s, cls._yellow)
+
+    @classmethod
+    def _format(cls, s: str, code: str) -> str:
+        if os.environ.get("NO_COLOR"):
+            # See https://no-color.org/
+            return s
+        return f"{code}{s}{cls._reset}"
+
+
+def tabulate(rows: List[List[Union[str, int]]], headers: List[str]) -> str:
+    """
+    Inspired by:
+
+    - stackoverflow.com/a/8356620/593036
+    - stackoverflow.com/questions/9535954/printing-lists-as-tabular-data
+    """
+    col_widths = [max(len(str(x)) for x in col) for col in zip(*rows, headers)]
+    row_format = ("{{:{}}} " * len(headers)).format(*col_widths)
+    lines = []
+    lines.append(row_format.format(*headers))
+    lines.append(row_format.format(*["-" * w for w in col_widths]))
+    for row in rows:
+        lines.append(row_format.format(*row))
+    return "\n".join(lines)
+
+
+def show_deprecation_warning(old_command: str, new_command: str):
+    """Show a yellow warning about deprecated CLI command."""
+    print(ANSI.yellow(f"⚠️  Warning: '{old_command}' is deprecated. Use '{new_command}' instead."))
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/delete_cache.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/delete_cache.py
new file mode 100644
index 0000000000000000000000000000000000000000..78ea1179678371807b3686b8acf17b9f0997035f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/delete_cache.py
@@ -0,0 +1,476 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to delete some revisions from the HF cache directory.
+
+Usage:
+    huggingface-cli delete-cache
+    huggingface-cli delete-cache --disable-tui
+    huggingface-cli delete-cache --dir ~/.cache/huggingface/hub
+    huggingface-cli delete-cache --sort=size
+
+NOTE:
+    This command is based on `InquirerPy` to build the multiselect menu in the terminal.
+    This dependency has to be installed with `pip install "huggingface_hub[cli]"`. Since
+    we want to avoid as much as possible cross-platform issues, I chose a library that
+    is built on top of `python-prompt-toolkit` which seems to be a reference in terminal
+    GUI (actively maintained on both Unix and Windows, 7.9k stars).
+
+    For the moment, the TUI feature is in beta.
+
+    See:
+    - https://github.com/kazhala/InquirerPy
+    - https://inquirerpy.readthedocs.io/en/latest/
+    - https://github.com/prompt-toolkit/python-prompt-toolkit
+
+    Other solutions could have been:
+    - `simple_term_menu`: would be good as well for our use case but some issues suggest
+      that Windows is less supported.
+      See: https://github.com/IngoMeyer441/simple-term-menu
+    - `PyInquirer`: very similar to `InquirerPy` but older and not maintained anymore.
+      In particular, no support of Python3.10.
+      See: https://github.com/CITGuru/PyInquirer
+    - `pick` (or `pickpack`): easy to use and flexible but built on top of Python's
+      standard library `curses` that is specific to Unix (not implemented on Windows).
+      See https://github.com/wong2/pick and https://github.com/anafvana/pickpack.
+    - `inquirer`: lot of traction (700 stars) but explicitly states "experimental
+      support of Windows". Not built on top of `python-prompt-toolkit`.
+      See https://github.com/magmax/python-inquirer
+
+TODO: add support for `huggingface-cli delete-cache aaaaaa bbbbbb cccccc (...)` ?
+TODO: add "--keep-last" arg to delete revisions that are not on `main` ref
+TODO: add "--filter" arg to filter repositories by name ?
+TODO: add "--limit" arg to limit to X repos ?
+TODO: add "-y" arg for immediate deletion ?
+See discussions in https://github.com/huggingface/huggingface_hub/issues/1025.
+"""
+
+import os
+from argparse import Namespace, _SubParsersAction
+from functools import wraps
+from tempfile import mkstemp
+from typing import Any, Callable, Iterable, List, Literal, Optional, Union
+
+from ..utils import CachedRepoInfo, CachedRevisionInfo, HFCacheInfo, scan_cache_dir
+from . import BaseHuggingfaceCLICommand
+from ._cli_utils import ANSI, show_deprecation_warning
+
+
+try:
+    from InquirerPy import inquirer
+    from InquirerPy.base.control import Choice
+    from InquirerPy.separator import Separator
+
+    _inquirer_py_available = True
+except ImportError:
+    _inquirer_py_available = False
+
+SortingOption_T = Literal["alphabetical", "lastUpdated", "lastUsed", "size"]
+
+
+def require_inquirer_py(fn: Callable) -> Callable:
+    """Decorator to flag methods that require `InquirerPy`."""
+
+    # TODO: refactor this + imports in a unified pattern across codebase
+    @wraps(fn)
+    def _inner(*args, **kwargs):
+        if not _inquirer_py_available:
+            raise ImportError(
+                "The `delete-cache` command requires extra dependencies to work with"
+                ' the TUI.\nPlease run `pip install "huggingface_hub[cli]"` to install'
+                " them.\nOtherwise, disable TUI using the `--disable-tui` flag."
+            )
+
+        return fn(*args, **kwargs)
+
+    return _inner
+
+
+# Possibility for the user to cancel deletion
+_CANCEL_DELETION_STR = "CANCEL_DELETION"
+
+
+class DeleteCacheCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        delete_cache_parser = parser.add_parser("delete-cache", help="Delete revisions from the cache directory.")
+
+        delete_cache_parser.add_argument(
+            "--dir",
+            type=str,
+            default=None,
+            help="cache directory (optional). Default to the default HuggingFace cache.",
+        )
+
+        delete_cache_parser.add_argument(
+            "--disable-tui",
+            action="store_true",
+            help=(
+                "Disable Terminal User Interface (TUI) mode. Useful if your"
+                " platform/terminal doesn't support the multiselect menu."
+            ),
+        )
+
+        delete_cache_parser.add_argument(
+            "--sort",
+            nargs="?",
+            choices=["alphabetical", "lastUpdated", "lastUsed", "size"],
+            help=(
+                "Sort repositories by the specified criteria. Options: "
+                "'alphabetical' (A-Z), "
+                "'lastUpdated' (newest first), "
+                "'lastUsed' (most recent first), "
+                "'size' (largest first)."
+            ),
+        )
+
+        delete_cache_parser.set_defaults(func=DeleteCacheCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.cache_dir: Optional[str] = args.dir
+        self.disable_tui: bool = args.disable_tui
+        self.sort_by: Optional[SortingOption_T] = args.sort
+
+    def run(self):
+        """Run `delete-cache` command with or without TUI."""
+        show_deprecation_warning("huggingface-cli delete-cache", "hf cache delete")
+
+        # Scan cache directory
+        hf_cache_info = scan_cache_dir(self.cache_dir)
+
+        # Manual review from the user
+        if self.disable_tui:
+            selected_hashes = _manual_review_no_tui(hf_cache_info, preselected=[], sort_by=self.sort_by)
+        else:
+            selected_hashes = _manual_review_tui(hf_cache_info, preselected=[], sort_by=self.sort_by)
+
+        # If deletion is not cancelled
+        if len(selected_hashes) > 0 and _CANCEL_DELETION_STR not in selected_hashes:
+            confirm_message = _get_expectations_str(hf_cache_info, selected_hashes) + " Confirm deletion ?"
+
+            # Confirm deletion
+            if self.disable_tui:
+                confirmed = _ask_for_confirmation_no_tui(confirm_message)
+            else:
+                confirmed = _ask_for_confirmation_tui(confirm_message)
+
+            # Deletion is confirmed
+            if confirmed:
+                strategy = hf_cache_info.delete_revisions(*selected_hashes)
+                print("Start deletion.")
+                strategy.execute()
+                print(
+                    f"Done. Deleted {len(strategy.repos)} repo(s) and"
+                    f" {len(strategy.snapshots)} revision(s) for a total of"
+                    f" {strategy.expected_freed_size_str}."
+                )
+                return
+
+        # Deletion is cancelled
+        print("Deletion is cancelled. Do nothing.")
+
+
+def _get_repo_sorting_key(repo: CachedRepoInfo, sort_by: Optional[SortingOption_T] = None):
+    if sort_by == "alphabetical":
+        return (repo.repo_type, repo.repo_id.lower())  # by type then name
+    elif sort_by == "lastUpdated":
+        return -max(rev.last_modified for rev in repo.revisions)  # newest first
+    elif sort_by == "lastUsed":
+        return -repo.last_accessed  # most recently used first
+    elif sort_by == "size":
+        return -repo.size_on_disk  # largest first
+    else:
+        return (repo.repo_type, repo.repo_id)  # default stable order
+
+
+@require_inquirer_py
+def _manual_review_tui(
+    hf_cache_info: HFCacheInfo,
+    preselected: List[str],
+    sort_by: Optional[SortingOption_T] = None,
+) -> List[str]:
+    """Ask the user for a manual review of the revisions to delete.
+
+    Displays a multi-select menu in the terminal (TUI).
+    """
+    # Define multiselect list
+    choices = _get_tui_choices_from_scan(
+        repos=hf_cache_info.repos,
+        preselected=preselected,
+        sort_by=sort_by,
+    )
+    checkbox = inquirer.checkbox(
+        message="Select revisions to delete:",
+        choices=choices,  # List of revisions with some pre-selection
+        cycle=False,  # No loop between top and bottom
+        height=100,  # Large list if possible
+        # We use the instruction to display to the user the expected effect of the
+        # deletion.
+        instruction=_get_expectations_str(
+            hf_cache_info,
+            selected_hashes=[c.value for c in choices if isinstance(c, Choice) and c.enabled],
+        ),
+        # We use the long instruction to should keybindings instructions to the user
+        long_instruction="Press <space> to select, <enter> to validate and <ctrl+c> to quit without modification.",
+        # Message that is displayed once the user validates its selection.
+        transformer=lambda result: f"{len(result)} revision(s) selected.",
+    )
+
+    # Add a callback to update the information line when a revision is
+    # selected/unselected
+    def _update_expectations(_) -> None:
+        # Hacky way to dynamically set an instruction message to the checkbox when
+        # a revision hash is selected/unselected.
+        checkbox._instruction = _get_expectations_str(
+            hf_cache_info,
+            selected_hashes=[choice["value"] for choice in checkbox.content_control.choices if choice["enabled"]],
+        )
+
+    checkbox.kb_func_lookup["toggle"].append({"func": _update_expectations})
+
+    # Finally display the form to the user.
+    try:
+        return checkbox.execute()
+    except KeyboardInterrupt:
+        return []  # Quit without deletion
+
+
+@require_inquirer_py
+def _ask_for_confirmation_tui(message: str, default: bool = True) -> bool:
+    """Ask for confirmation using Inquirer."""
+    return inquirer.confirm(message, default=default).execute()
+
+
+def _get_tui_choices_from_scan(
+    repos: Iterable[CachedRepoInfo],
+    preselected: List[str],
+    sort_by: Optional[SortingOption_T] = None,
+) -> List:
+    """Build a list of choices from the scanned repos.
+
+    Args:
+        repos (*Iterable[`CachedRepoInfo`]*):
+            List of scanned repos on which we want to delete revisions.
+        preselected (*List[`str`]*):
+            List of revision hashes that will be preselected.
+        sort_by (*Optional[SortingOption_T]*):
+            Sorting direction. Choices: "alphabetical", "lastUpdated", "lastUsed", "size".
+
+    Return:
+        The list of choices to pass to `inquirer.checkbox`.
+    """
+    choices: List[Union[Choice, Separator]] = []
+
+    # First choice is to cancel the deletion
+    choices.append(
+        Choice(
+            _CANCEL_DELETION_STR,
+            name="None of the following (if selected, nothing will be deleted).",
+            enabled=False,
+        )
+    )
+
+    # Sort repos based on specified criteria
+    sorted_repos = sorted(repos, key=lambda repo: _get_repo_sorting_key(repo, sort_by))
+
+    for repo in sorted_repos:
+        # Repo as separator
+        choices.append(
+            Separator(
+                f"\n{repo.repo_type.capitalize()} {repo.repo_id} ({repo.size_on_disk_str},"
+                f" used {repo.last_accessed_str})"
+            )
+        )
+        for revision in sorted(repo.revisions, key=_revision_sorting_order):
+            # Revision as choice
+            choices.append(
+                Choice(
+                    revision.commit_hash,
+                    name=(
+                        f"{revision.commit_hash[:8]}:"
+                        f" {', '.join(sorted(revision.refs)) or '(detached)'} #"
+                        f" modified {revision.last_modified_str}"
+                    ),
+                    enabled=revision.commit_hash in preselected,
+                )
+            )
+
+    # Return choices
+    return choices
+
+
+def _manual_review_no_tui(
+    hf_cache_info: HFCacheInfo,
+    preselected: List[str],
+    sort_by: Optional[SortingOption_T] = None,
+) -> List[str]:
+    """Ask the user for a manual review of the revisions to delete.
+
+    Used when TUI is disabled. Manual review happens in a separate tmp file that the
+    user can manually edit.
+    """
+    # 1. Generate temporary file with delete commands.
+    fd, tmp_path = mkstemp(suffix=".txt")  # suffix to make it easier to find by editors
+    os.close(fd)
+
+    lines = []
+
+    sorted_repos = sorted(hf_cache_info.repos, key=lambda repo: _get_repo_sorting_key(repo, sort_by))
+
+    for repo in sorted_repos:
+        lines.append(
+            f"\n# {repo.repo_type.capitalize()} {repo.repo_id} ({repo.size_on_disk_str},"
+            f" used {repo.last_accessed_str})"
+        )
+        for revision in sorted(repo.revisions, key=_revision_sorting_order):
+            lines.append(
+                # Deselect by prepending a '#'
+                f"{'' if revision.commit_hash in preselected else '#'}   "
+                f" {revision.commit_hash} # Refs:"
+                # Print `refs` as comment on same line
+                f" {', '.join(sorted(revision.refs)) or '(detached)'} # modified"
+                # Print `last_modified` as comment on same line
+                f" {revision.last_modified_str}"
+            )
+
+    with open(tmp_path, "w") as f:
+        f.write(_MANUAL_REVIEW_NO_TUI_INSTRUCTIONS)
+        f.write("\n".join(lines))
+
+    # 2. Prompt instructions to user.
+    instructions = f"""
+    TUI is disabled. In order to select which revisions you want to delete, please edit
+    the following file using the text editor of your choice. Instructions for manual
+    editing are located at the beginning of the file. Edit the file, save it and confirm
+    to continue.
+    File to edit: {ANSI.bold(tmp_path)}
+    """
+    print("\n".join(line.strip() for line in instructions.strip().split("\n")))
+
+    # 3. Wait for user confirmation.
+    while True:
+        selected_hashes = _read_manual_review_tmp_file(tmp_path)
+        if _ask_for_confirmation_no_tui(
+            _get_expectations_str(hf_cache_info, selected_hashes) + " Continue ?",
+            default=False,
+        ):
+            break
+
+    # 4. Return selected_hashes sorted to maintain stable order
+    os.remove(tmp_path)
+    return sorted(selected_hashes)  # Sort to maintain stable order
+
+
+def _ask_for_confirmation_no_tui(message: str, default: bool = True) -> bool:
+    """Ask for confirmation using pure-python."""
+    YES = ("y", "yes", "1")
+    NO = ("n", "no", "0")
+    DEFAULT = ""
+    ALL = YES + NO + (DEFAULT,)
+    full_message = message + (" (Y/n) " if default else " (y/N) ")
+    while True:
+        answer = input(full_message).lower()
+        if answer == DEFAULT:
+            return default
+        if answer in YES:
+            return True
+        if answer in NO:
+            return False
+        print(f"Invalid input. Must be one of {ALL}")
+
+
+def _get_expectations_str(hf_cache_info: HFCacheInfo, selected_hashes: List[str]) -> str:
+    """Format a string to display to the user how much space would be saved.
+
+    Example:
+    ```
+    >>> _get_expectations_str(hf_cache_info, selected_hashes)
+    '7 revisions selected counting for 4.3G.'
+    ```
+    """
+    if _CANCEL_DELETION_STR in selected_hashes:
+        return "Nothing will be deleted."
+    strategy = hf_cache_info.delete_revisions(*selected_hashes)
+    return f"{len(selected_hashes)} revisions selected counting for {strategy.expected_freed_size_str}."
+
+
+def _read_manual_review_tmp_file(tmp_path: str) -> List[str]:
+    """Read the manually reviewed instruction file and return a list of revision hash.
+
+    Example:
+        ```txt
+        # This is the tmp file content
+        ###
+
+        # Commented out line
+        123456789 # revision hash
+
+        # Something else
+        #      a_newer_hash # 2 days ago
+            an_older_hash # 3 days ago
+        ```
+
+        ```py
+        >>> _read_manual_review_tmp_file(tmp_path)
+        ['123456789', 'an_older_hash']
+        ```
+    """
+    with open(tmp_path) as f:
+        content = f.read()
+
+    # Split lines
+    lines = [line.strip() for line in content.split("\n")]
+
+    # Filter commented lines
+    selected_lines = [line for line in lines if not line.startswith("#")]
+
+    # Select only before comment
+    selected_hashes = [line.split("#")[0].strip() for line in selected_lines]
+
+    # Return revision hashes
+    return [hash for hash in selected_hashes if len(hash) > 0]
+
+
+_MANUAL_REVIEW_NO_TUI_INSTRUCTIONS = f"""
+# INSTRUCTIONS
+# ------------
+# This is a temporary file created by running `huggingface-cli delete-cache` with the
+# `--disable-tui` option. It contains a set of revisions that can be deleted from your
+# local cache directory.
+#
+# Please manually review the revisions you want to delete:
+#   - Revision hashes can be commented out with '#'.
+#   - Only non-commented revisions in this file will be deleted.
+#   - Revision hashes that are removed from this file are ignored as well.
+#   - If `{_CANCEL_DELETION_STR}` line is uncommented, the all cache deletion is cancelled and
+#     no changes will be applied.
+#
+# Once you've manually reviewed this file, please confirm deletion in the terminal. This
+# file will be automatically removed once done.
+# ------------
+
+# KILL SWITCH
+# ------------
+# Un-comment following line to completely cancel the deletion process
+# {_CANCEL_DELETION_STR}
+# ------------
+
+# REVISIONS
+# ------------
+""".strip()
+
+
+def _revision_sorting_order(revision: CachedRevisionInfo) -> Any:
+    # Sort by last modified (oldest first)
+    return revision.last_modified
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/download.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/download.py
new file mode 100644
index 0000000000000000000000000000000000000000..0dd2c1070ead01f9ad6855de3929928d268279c2
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/download.py
@@ -0,0 +1,204 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to download files from the Hub with the CLI.
+
+Usage:
+    huggingface-cli download --help
+
+    # Download file
+    huggingface-cli download gpt2 config.json
+
+    # Download entire repo
+    huggingface-cli download fffiloni/zeroscope --repo-type=space --revision=refs/pr/78
+
+    # Download repo with filters
+    huggingface-cli download gpt2 --include="*.safetensors"
+
+    # Download with token
+    huggingface-cli download Wauplin/private-model --token=hf_***
+
+    # Download quietly (no progress bar, no warnings, only the returned path)
+    huggingface-cli download gpt2 config.json --quiet
+
+    # Download to local dir
+    huggingface-cli download gpt2 --local-dir=./models/gpt2
+"""
+
+import warnings
+from argparse import Namespace, _SubParsersAction
+from typing import List, Optional
+
+from huggingface_hub import logging
+from huggingface_hub._snapshot_download import snapshot_download
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.file_download import hf_hub_download
+from huggingface_hub.utils import disable_progress_bars, enable_progress_bars
+
+from ._cli_utils import show_deprecation_warning
+
+
+logger = logging.get_logger(__name__)
+
+
+class DownloadCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        download_parser = parser.add_parser("download", help="Download files from the Hub")
+        download_parser.add_argument(
+            "repo_id", type=str, help="ID of the repo to download from (e.g. `username/repo-name`)."
+        )
+        download_parser.add_argument(
+            "filenames", type=str, nargs="*", help="Files to download (e.g. `config.json`, `data/metadata.jsonl`)."
+        )
+        download_parser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            default="model",
+            help="Type of repo to download from (defaults to 'model').",
+        )
+        download_parser.add_argument(
+            "--revision",
+            type=str,
+            help="An optional Git revision id which can be a branch name, a tag, or a commit hash.",
+        )
+        download_parser.add_argument(
+            "--include", nargs="*", type=str, help="Glob patterns to match files to download."
+        )
+        download_parser.add_argument(
+            "--exclude", nargs="*", type=str, help="Glob patterns to exclude from files to download."
+        )
+        download_parser.add_argument(
+            "--cache-dir", type=str, help="Path to the directory where to save the downloaded files."
+        )
+        download_parser.add_argument(
+            "--local-dir",
+            type=str,
+            help=(
+                "If set, the downloaded file will be placed under this directory. Check out"
+                " https://huggingface.co/docs/huggingface_hub/guides/download#download-files-to-local-folder for more"
+                " details."
+            ),
+        )
+        download_parser.add_argument(
+            "--local-dir-use-symlinks",
+            choices=["auto", "True", "False"],
+            help=("Deprecated and ignored. Downloading to a local directory does not use symlinks anymore."),
+        )
+        download_parser.add_argument(
+            "--force-download",
+            action="store_true",
+            help="If True, the files will be downloaded even if they are already cached.",
+        )
+        download_parser.add_argument(
+            "--resume-download",
+            action="store_true",
+            help="Deprecated and ignored. Downloading a file to local dir always attempts to resume previously interrupted downloads (unless hf-transfer is enabled).",
+        )
+        download_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        download_parser.add_argument(
+            "--quiet",
+            action="store_true",
+            help="If True, progress bars are disabled and only the path to the download files is printed.",
+        )
+        download_parser.add_argument(
+            "--max-workers",
+            type=int,
+            default=8,
+            help="Maximum number of workers to use for downloading files. Default is 8.",
+        )
+        download_parser.set_defaults(func=DownloadCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.token = args.token
+        self.repo_id: str = args.repo_id
+        self.filenames: List[str] = args.filenames
+        self.repo_type: str = args.repo_type
+        self.revision: Optional[str] = args.revision
+        self.include: Optional[List[str]] = args.include
+        self.exclude: Optional[List[str]] = args.exclude
+        self.cache_dir: Optional[str] = args.cache_dir
+        self.local_dir: Optional[str] = args.local_dir
+        self.force_download: bool = args.force_download
+        self.resume_download: Optional[bool] = args.resume_download or None
+        self.quiet: bool = args.quiet
+        self.max_workers: int = args.max_workers
+
+        if args.local_dir_use_symlinks is not None:
+            warnings.warn(
+                "Ignoring --local-dir-use-symlinks. Downloading to a local directory does not use symlinks anymore.",
+                FutureWarning,
+            )
+
+    def run(self) -> None:
+        show_deprecation_warning("huggingface-cli download", "hf download")
+
+        if self.quiet:
+            disable_progress_bars()
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore")
+                print(self._download())  # Print path to downloaded files
+            enable_progress_bars()
+        else:
+            logging.set_verbosity_info()
+            print(self._download())  # Print path to downloaded files
+            logging.set_verbosity_warning()
+
+    def _download(self) -> str:
+        # Warn user if patterns are ignored
+        if len(self.filenames) > 0:
+            if self.include is not None and len(self.include) > 0:
+                warnings.warn("Ignoring `--include` since filenames have being explicitly set.")
+            if self.exclude is not None and len(self.exclude) > 0:
+                warnings.warn("Ignoring `--exclude` since filenames have being explicitly set.")
+
+        # Single file to download: use `hf_hub_download`
+        if len(self.filenames) == 1:
+            return hf_hub_download(
+                repo_id=self.repo_id,
+                repo_type=self.repo_type,
+                revision=self.revision,
+                filename=self.filenames[0],
+                cache_dir=self.cache_dir,
+                resume_download=self.resume_download,
+                force_download=self.force_download,
+                token=self.token,
+                local_dir=self.local_dir,
+                library_name="huggingface-cli",
+            )
+
+        # Otherwise: use `snapshot_download` to ensure all files comes from same revision
+        elif len(self.filenames) == 0:
+            allow_patterns = self.include
+            ignore_patterns = self.exclude
+        else:
+            allow_patterns = self.filenames
+            ignore_patterns = None
+
+        return snapshot_download(
+            repo_id=self.repo_id,
+            repo_type=self.repo_type,
+            revision=self.revision,
+            allow_patterns=allow_patterns,
+            ignore_patterns=ignore_patterns,
+            resume_download=self.resume_download,
+            force_download=self.force_download,
+            cache_dir=self.cache_dir,
+            token=self.token,
+            local_dir=self.local_dir,
+            library_name="huggingface-cli",
+            max_workers=self.max_workers,
+        )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/env.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/env.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad674738b2f137ec0b79c11ef35057a351de6d86
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/env.py
@@ -0,0 +1,39 @@
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to print information about the environment.
+
+Usage:
+    huggingface-cli env
+"""
+
+from argparse import _SubParsersAction
+
+from ..utils import dump_environment_info
+from . import BaseHuggingfaceCLICommand
+from ._cli_utils import show_deprecation_warning
+
+
+class EnvironmentCommand(BaseHuggingfaceCLICommand):
+    def __init__(self, args):
+        self.args = args
+
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        env_parser = parser.add_parser("env", help="Print information about the environment.")
+        env_parser.set_defaults(func=EnvironmentCommand)
+
+    def run(self) -> None:
+        show_deprecation_warning("huggingface-cli env", "hf env")
+
+        dump_environment_info()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/huggingface_cli.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/huggingface_cli.py
new file mode 100644
index 0000000000000000000000000000000000000000..697c85d1e386d9c954be0f8112cb12e1bc84e7fe
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/huggingface_cli.py
@@ -0,0 +1,65 @@
+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from argparse import ArgumentParser
+
+from huggingface_hub.commands._cli_utils import show_deprecation_warning
+from huggingface_hub.commands.delete_cache import DeleteCacheCommand
+from huggingface_hub.commands.download import DownloadCommand
+from huggingface_hub.commands.env import EnvironmentCommand
+from huggingface_hub.commands.lfs import LfsCommands
+from huggingface_hub.commands.repo import RepoCommands
+from huggingface_hub.commands.repo_files import RepoFilesCommand
+from huggingface_hub.commands.scan_cache import ScanCacheCommand
+from huggingface_hub.commands.tag import TagCommands
+from huggingface_hub.commands.upload import UploadCommand
+from huggingface_hub.commands.upload_large_folder import UploadLargeFolderCommand
+from huggingface_hub.commands.user import UserCommands
+from huggingface_hub.commands.version import VersionCommand
+
+
+def main():
+    parser = ArgumentParser("huggingface-cli", usage="huggingface-cli <command> [<args>]")
+    commands_parser = parser.add_subparsers(help="huggingface-cli command helpers")
+
+    # Register commands
+    DownloadCommand.register_subcommand(commands_parser)
+    UploadCommand.register_subcommand(commands_parser)
+    RepoFilesCommand.register_subcommand(commands_parser)
+    EnvironmentCommand.register_subcommand(commands_parser)
+    UserCommands.register_subcommand(commands_parser)
+    RepoCommands.register_subcommand(commands_parser)
+    LfsCommands.register_subcommand(commands_parser)
+    ScanCacheCommand.register_subcommand(commands_parser)
+    DeleteCacheCommand.register_subcommand(commands_parser)
+    TagCommands.register_subcommand(commands_parser)
+    VersionCommand.register_subcommand(commands_parser)
+
+    # Experimental
+    UploadLargeFolderCommand.register_subcommand(commands_parser)
+
+    # Let's go
+    args = parser.parse_args()
+    if not hasattr(args, "func"):
+        show_deprecation_warning("huggingface-cli", "hf")
+        parser.print_help()
+        exit(1)
+
+    # Run
+    service = args.func(args)
+    service.run()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/lfs.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/lfs.py
new file mode 100644
index 0000000000000000000000000000000000000000..e510e345e6a4bf6da03f71b35cbfa2a4f0eb7325
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/lfs.py
@@ -0,0 +1,200 @@
+"""
+Implementation of a custom transfer agent for the transfer type "multipart" for
+git-lfs.
+
+Inspired by:
+github.com/cbartz/git-lfs-swift-transfer-agent/blob/master/git_lfs_swift_transfer.py
+
+Spec is: github.com/git-lfs/git-lfs/blob/master/docs/custom-transfers.md
+
+
+To launch debugger while developing:
+
+``` [lfs "customtransfer.multipart"]
+path = /path/to/huggingface_hub/.env/bin/python args = -m debugpy --listen 5678
+--wait-for-client
+/path/to/huggingface_hub/src/huggingface_hub/commands/huggingface_cli.py
+lfs-multipart-upload ```"""
+
+import json
+import os
+import subprocess
+import sys
+from argparse import _SubParsersAction
+from typing import Dict, List, Optional
+
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.lfs import LFS_MULTIPART_UPLOAD_COMMAND
+
+from ..utils import get_session, hf_raise_for_status, logging
+from ..utils._lfs import SliceFileObj
+
+
+logger = logging.get_logger(__name__)
+
+
+class LfsCommands(BaseHuggingfaceCLICommand):
+    """
+    Implementation of a custom transfer agent for the transfer type "multipart"
+    for git-lfs. This lets users upload large files >5GB 🔥. Spec for LFS custom
+    transfer agent is:
+    https://github.com/git-lfs/git-lfs/blob/master/docs/custom-transfers.md
+
+    This introduces two commands to the CLI:
+
+    1. $ huggingface-cli lfs-enable-largefiles
+
+    This should be executed once for each model repo that contains a model file
+    >5GB. It's documented in the error message you get if you just try to git
+    push a 5GB file without having enabled it before.
+
+    2. $ huggingface-cli lfs-multipart-upload
+
+    This command is called by lfs directly and is not meant to be called by the
+    user.
+    """
+
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        enable_parser = parser.add_parser(
+            "lfs-enable-largefiles", help="Configure your repository to enable upload of files > 5GB."
+        )
+        enable_parser.add_argument("path", type=str, help="Local path to repository you want to configure.")
+        enable_parser.set_defaults(func=lambda args: LfsEnableCommand(args))
+
+        # Command will get called by git-lfs, do not call it directly.
+        upload_parser = parser.add_parser(LFS_MULTIPART_UPLOAD_COMMAND, add_help=False)
+        upload_parser.set_defaults(func=lambda args: LfsUploadCommand(args))
+
+
+class LfsEnableCommand:
+    def __init__(self, args):
+        self.args = args
+
+    def run(self):
+        local_path = os.path.abspath(self.args.path)
+        if not os.path.isdir(local_path):
+            print("This does not look like a valid git repo.")
+            exit(1)
+        subprocess.run(
+            "git config lfs.customtransfer.multipart.path huggingface-cli".split(),
+            check=True,
+            cwd=local_path,
+        )
+        subprocess.run(
+            f"git config lfs.customtransfer.multipart.args {LFS_MULTIPART_UPLOAD_COMMAND}".split(),
+            check=True,
+            cwd=local_path,
+        )
+        print("Local repo set up for largefiles")
+
+
+def write_msg(msg: Dict):
+    """Write out the message in Line delimited JSON."""
+    msg_str = json.dumps(msg) + "\n"
+    sys.stdout.write(msg_str)
+    sys.stdout.flush()
+
+
+def read_msg() -> Optional[Dict]:
+    """Read Line delimited JSON from stdin."""
+    msg = json.loads(sys.stdin.readline().strip())
+
+    if "terminate" in (msg.get("type"), msg.get("event")):
+        # terminate message received
+        return None
+
+    if msg.get("event") not in ("download", "upload"):
+        logger.critical("Received unexpected message")
+        sys.exit(1)
+
+    return msg
+
+
+class LfsUploadCommand:
+    def __init__(self, args) -> None:
+        self.args = args
+
+    def run(self) -> None:
+        # Immediately after invoking a custom transfer process, git-lfs
+        # sends initiation data to the process over stdin.
+        # This tells the process useful information about the configuration.
+        init_msg = json.loads(sys.stdin.readline().strip())
+        if not (init_msg.get("event") == "init" and init_msg.get("operation") == "upload"):
+            write_msg({"error": {"code": 32, "message": "Wrong lfs init operation"}})
+            sys.exit(1)
+
+        # The transfer process should use the information it needs from the
+        # initiation structure, and also perform any one-off setup tasks it
+        # needs to do. It should then respond on stdout with a simple empty
+        # confirmation structure, as follows:
+        write_msg({})
+
+        # After the initiation exchange, git-lfs will send any number of
+        # transfer requests to the stdin of the transfer process, in a serial sequence.
+        while True:
+            msg = read_msg()
+            if msg is None:
+                # When all transfers have been processed, git-lfs will send
+                # a terminate event to the stdin of the transfer process.
+                # On receiving this message the transfer process should
+                # clean up and terminate. No response is expected.
+                sys.exit(0)
+
+            oid = msg["oid"]
+            filepath = msg["path"]
+            completion_url = msg["action"]["href"]
+            header = msg["action"]["header"]
+            chunk_size = int(header.pop("chunk_size"))
+            presigned_urls: List[str] = list(header.values())
+
+            # Send a "started" progress event to allow other workers to start.
+            # Otherwise they're delayed until first "progress" event is reported,
+            # i.e. after the first 5GB by default (!)
+            write_msg(
+                {
+                    "event": "progress",
+                    "oid": oid,
+                    "bytesSoFar": 1,
+                    "bytesSinceLast": 0,
+                }
+            )
+
+            parts = []
+            with open(filepath, "rb") as file:
+                for i, presigned_url in enumerate(presigned_urls):
+                    with SliceFileObj(
+                        file,
+                        seek_from=i * chunk_size,
+                        read_limit=chunk_size,
+                    ) as data:
+                        r = get_session().put(presigned_url, data=data)
+                        hf_raise_for_status(r)
+                        parts.append(
+                            {
+                                "etag": r.headers.get("etag"),
+                                "partNumber": i + 1,
+                            }
+                        )
+                        # In order to support progress reporting while data is uploading / downloading,
+                        # the transfer process should post messages to stdout
+                        write_msg(
+                            {
+                                "event": "progress",
+                                "oid": oid,
+                                "bytesSoFar": (i + 1) * chunk_size,
+                                "bytesSinceLast": chunk_size,
+                            }
+                        )
+                        # Not precise but that's ok.
+
+            r = get_session().post(
+                completion_url,
+                json={
+                    "oid": oid,
+                    "parts": parts,
+                },
+            )
+            hf_raise_for_status(r)
+
+            write_msg({"event": "complete", "oid": oid})
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/repo.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/repo.py
new file mode 100644
index 0000000000000000000000000000000000000000..fe75349d67bdc0314afe737daa7224b2a090f810
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/repo.py
@@ -0,0 +1,151 @@
+# Copyright 2025 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains commands to interact with repositories on the Hugging Face Hub.
+
+Usage:
+    # create a new dataset repo on the Hub
+    huggingface-cli repo create my-cool-dataset --repo-type=dataset
+
+    # create a private model repo on the Hub
+    huggingface-cli repo create my-cool-model --private
+"""
+
+import argparse
+from argparse import _SubParsersAction
+from typing import Optional
+
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.commands._cli_utils import ANSI
+from huggingface_hub.constants import SPACES_SDK_TYPES
+from huggingface_hub.hf_api import HfApi
+from huggingface_hub.utils import logging
+
+from ._cli_utils import show_deprecation_warning
+
+
+logger = logging.get_logger(__name__)
+
+
+class RepoCommands(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        repo_parser = parser.add_parser("repo", help="{create} Commands to interact with your huggingface.co repos.")
+        repo_subparsers = repo_parser.add_subparsers(help="huggingface.co repos related commands")
+        repo_create_parser = repo_subparsers.add_parser("create", help="Create a new repo on huggingface.co")
+        repo_create_parser.add_argument(
+            "repo_id",
+            type=str,
+            help="The ID of the repo to create to (e.g. `username/repo-name`). The username is optional and will be set to your username if not provided.",
+        )
+        repo_create_parser.add_argument(
+            "--repo-type",
+            type=str,
+            help='Optional: set to "dataset" or "space" if creating a dataset or space, default is model.',
+        )
+        repo_create_parser.add_argument(
+            "--space_sdk",
+            type=str,
+            help='Optional: Hugging Face Spaces SDK type. Required when --type is set to "space".',
+            choices=SPACES_SDK_TYPES,
+        )
+        repo_create_parser.add_argument(
+            "--private",
+            action="store_true",
+            help="Whether to create a private repository. Defaults to public unless the organization's default is private.",
+        )
+        repo_create_parser.add_argument(
+            "--token",
+            type=str,
+            help="Hugging Face token. Will default to the locally saved token if not provided.",
+        )
+        repo_create_parser.add_argument(
+            "--exist-ok",
+            action="store_true",
+            help="Do not raise an error if repo already exists.",
+        )
+        repo_create_parser.add_argument(
+            "--resource-group-id",
+            type=str,
+            help="Resource group in which to create the repo. Resource groups is only available for Enterprise Hub organizations.",
+        )
+        repo_create_parser.add_argument(
+            "--type",
+            type=str,
+            help="[Deprecated]: use --repo-type instead.",
+        )
+        repo_create_parser.add_argument(
+            "-y",
+            "--yes",
+            action="store_true",
+            help="[Deprecated] no effect.",
+        )
+        repo_create_parser.add_argument(
+            "--organization", type=str, help="[Deprecated] Pass the organization namespace directly in the repo_id."
+        )
+        repo_create_parser.set_defaults(func=lambda args: RepoCreateCommand(args))
+
+
+class RepoCreateCommand:
+    def __init__(self, args: argparse.Namespace):
+        self.repo_id: str = args.repo_id
+        self.repo_type: Optional[str] = args.repo_type or args.type
+        self.space_sdk: Optional[str] = args.space_sdk
+        self.organization: Optional[str] = args.organization
+        self.yes: bool = args.yes
+        self.private: bool = args.private
+        self.token: Optional[str] = args.token
+        self.exist_ok: bool = args.exist_ok
+        self.resource_group_id: Optional[str] = args.resource_group_id
+
+        if args.type is not None:
+            print(
+                ANSI.yellow(
+                    "The --type argument is deprecated and will be removed in a future version. Use --repo-type instead."
+                )
+            )
+        if self.organization is not None:
+            print(
+                ANSI.yellow(
+                    "The --organization argument is deprecated and will be removed in a future version. Pass the organization namespace directly in the repo_id."
+                )
+            )
+        if self.yes:
+            print(
+                ANSI.yellow(
+                    "The --yes argument is deprecated and will be removed in a future version. It does not have any effect."
+                )
+            )
+
+        self._api = HfApi()
+
+    def run(self):
+        show_deprecation_warning("huggingface-cli repo", "hf repo")
+
+        if self.organization is not None:
+            if "/" in self.repo_id:
+                print(ANSI.red("You cannot pass both --organization and a repo_id with a namespace."))
+                exit(1)
+            self.repo_id = f"{self.organization}/{self.repo_id}"
+
+        repo_url = self._api.create_repo(
+            repo_id=self.repo_id,
+            repo_type=self.repo_type,
+            private=self.private,
+            token=self.token,
+            exist_ok=self.exist_ok,
+            resource_group_id=self.resource_group_id,
+            space_sdk=self.space_sdk,
+        )
+        print(f"Successfully created {ANSI.bold(repo_url.repo_id)} on the Hub.")
+        print(f"Your repo is now available at {ANSI.bold(repo_url)}")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/repo_files.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/repo_files.py
new file mode 100644
index 0000000000000000000000000000000000000000..da9685315ea67dc9d1e9921ecb2656244cae8783
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/repo_files.py
@@ -0,0 +1,132 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to update or delete files in a repository using the CLI.
+
+Usage:
+    # delete all
+    huggingface-cli repo-files <repo_id> delete "*"
+
+    # delete single file
+    huggingface-cli repo-files <repo_id> delete file.txt
+
+    # delete single folder
+    huggingface-cli repo-files <repo_id> delete folder/
+
+    # delete multiple
+    huggingface-cli repo-files <repo_id> delete file.txt folder/ file2.txt
+
+    # delete multiple patterns
+    huggingface-cli repo-files <repo_id> delete file.txt "*.json" "folder/*.parquet"
+
+    # delete from different revision / repo-type
+    huggingface-cli repo-files <repo_id> delete file.txt --revision=refs/pr/1 --repo-type=dataset
+"""
+
+from argparse import _SubParsersAction
+from typing import List, Optional
+
+from huggingface_hub import logging
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.hf_api import HfApi
+
+from ._cli_utils import show_deprecation_warning
+
+
+logger = logging.get_logger(__name__)
+
+
+class DeleteFilesSubCommand:
+    def __init__(self, args) -> None:
+        self.args = args
+        self.repo_id: str = args.repo_id
+        self.repo_type: Optional[str] = args.repo_type
+        self.revision: Optional[str] = args.revision
+        self.api: HfApi = HfApi(token=args.token, library_name="huggingface-cli")
+        self.patterns: List[str] = args.patterns
+        self.commit_message: Optional[str] = args.commit_message
+        self.commit_description: Optional[str] = args.commit_description
+        self.create_pr: bool = args.create_pr
+        self.token: Optional[str] = args.token
+
+    def run(self) -> None:
+        show_deprecation_warning("huggingface-cli repo-files", "hf repo-files")
+
+        logging.set_verbosity_info()
+        url = self.api.delete_files(
+            delete_patterns=self.patterns,
+            repo_id=self.repo_id,
+            repo_type=self.repo_type,
+            revision=self.revision,
+            commit_message=self.commit_message,
+            commit_description=self.commit_description,
+            create_pr=self.create_pr,
+        )
+        print(f"Files correctly deleted from repo. Commit: {url}.")
+        logging.set_verbosity_warning()
+
+
+class RepoFilesCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        repo_files_parser = parser.add_parser("repo-files", help="Manage files in a repo on the Hub")
+        repo_files_parser.add_argument(
+            "repo_id", type=str, help="The ID of the repo to manage (e.g. `username/repo-name`)."
+        )
+        repo_files_subparsers = repo_files_parser.add_subparsers(
+            help="Action to execute against the files.",
+            required=True,
+        )
+        delete_subparser = repo_files_subparsers.add_parser(
+            "delete",
+            help="Delete files from a repo on the Hub",
+        )
+        delete_subparser.set_defaults(func=lambda args: DeleteFilesSubCommand(args))
+        delete_subparser.add_argument(
+            "patterns",
+            nargs="+",
+            type=str,
+            help="Glob patterns to match files to delete.",
+        )
+        delete_subparser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            default="model",
+            help="Type of the repo to upload to (e.g. `dataset`).",
+        )
+        delete_subparser.add_argument(
+            "--revision",
+            type=str,
+            help=(
+                "An optional Git revision to push to. It can be a branch name "
+                "or a PR reference. If revision does not"
+                " exist and `--create-pr` is not set, a branch will be automatically created."
+            ),
+        )
+        delete_subparser.add_argument(
+            "--commit-message", type=str, help="The summary / title / first line of the generated commit."
+        )
+        delete_subparser.add_argument(
+            "--commit-description", type=str, help="The description of the generated commit."
+        )
+        delete_subparser.add_argument(
+            "--create-pr", action="store_true", help="Whether to create a new Pull Request for these changes."
+        )
+        repo_files_parser.add_argument(
+            "--token",
+            type=str,
+            help="A User Access Token generated from https://huggingface.co/settings/tokens",
+        )
+
+        repo_files_parser.set_defaults(func=RepoFilesCommand)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/scan_cache.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/scan_cache.py
new file mode 100644
index 0000000000000000000000000000000000000000..711a5d09cc2b64b9c7f22a298e26a198b4dc48f1
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/scan_cache.py
@@ -0,0 +1,183 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to scan the HF cache directory.
+
+Usage:
+    huggingface-cli scan-cache
+    huggingface-cli scan-cache -v
+    huggingface-cli scan-cache -vvv
+    huggingface-cli scan-cache --dir ~/.cache/huggingface/hub
+"""
+
+import time
+from argparse import Namespace, _SubParsersAction
+from typing import Optional
+
+from ..utils import CacheNotFound, HFCacheInfo, scan_cache_dir
+from . import BaseHuggingfaceCLICommand
+from ._cli_utils import ANSI, show_deprecation_warning, tabulate
+
+
+class ScanCacheCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        scan_cache_parser = parser.add_parser("scan-cache", help="Scan cache directory.")
+
+        scan_cache_parser.add_argument(
+            "--dir",
+            type=str,
+            default=None,
+            help="cache directory to scan (optional). Default to the default HuggingFace cache.",
+        )
+        scan_cache_parser.add_argument(
+            "-v",
+            "--verbose",
+            action="count",
+            default=0,
+            help="show a more verbose output",
+        )
+        scan_cache_parser.set_defaults(func=ScanCacheCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.verbosity: int = args.verbose
+        self.cache_dir: Optional[str] = args.dir
+
+    def run(self):
+        show_deprecation_warning("huggingface-cli scan-cache", "hf cache scan")
+
+        try:
+            t0 = time.time()
+            hf_cache_info = scan_cache_dir(self.cache_dir)
+            t1 = time.time()
+        except CacheNotFound as exc:
+            cache_dir = exc.cache_dir
+            print(f"Cache directory not found: {cache_dir}")
+            return
+
+        self._print_hf_cache_info_as_table(hf_cache_info)
+
+        print(
+            f"\nDone in {round(t1 - t0, 1)}s. Scanned {len(hf_cache_info.repos)} repo(s)"
+            f" for a total of {ANSI.red(hf_cache_info.size_on_disk_str)}."
+        )
+        if len(hf_cache_info.warnings) > 0:
+            message = f"Got {len(hf_cache_info.warnings)} warning(s) while scanning."
+            if self.verbosity >= 3:
+                print(ANSI.gray(message))
+                for warning in hf_cache_info.warnings:
+                    print(ANSI.gray(str(warning)))
+            else:
+                print(ANSI.gray(message + " Use -vvv to print details."))
+
+    def _print_hf_cache_info_as_table(self, hf_cache_info: HFCacheInfo) -> None:
+        print(get_table(hf_cache_info, verbosity=self.verbosity))
+
+
+def get_table(hf_cache_info: HFCacheInfo, *, verbosity: int = 0) -> str:
+    """Generate a table from the [`HFCacheInfo`] object.
+
+    Pass `verbosity=0` to get a table with a single row per repo, with columns
+    "repo_id", "repo_type", "size_on_disk", "nb_files", "last_accessed", "last_modified", "refs", "local_path".
+
+    Pass `verbosity=1` to get a table with a row per repo and revision (thus multiple rows can appear for a single repo), with columns
+    "repo_id", "repo_type", "revision", "size_on_disk", "nb_files", "last_modified", "refs", "local_path".
+
+    Example:
+    ```py
+    >>> from huggingface_hub.utils import scan_cache_dir
+    >>> from huggingface_hub.commands.scan_cache import get_table
+
+    >>> hf_cache_info = scan_cache_dir()
+    HFCacheInfo(...)
+
+    >>> print(get_table(hf_cache_info, verbosity=0))
+    REPO ID                                             REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH
+    --------------------------------------------------- --------- ------------ -------- ------------- ------------- ---- --------------------------------------------------------------------------------------------------
+    roberta-base                                        model             2.7M        5 1 day ago     1 week ago    main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--roberta-base
+    suno/bark                                           model             8.8K        1 1 week ago    1 week ago    main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--suno--bark
+    t5-base                                             model           893.8M        4 4 days ago    7 months ago  main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--t5-base
+    t5-large                                            model             3.0G        4 5 weeks ago   5 months ago  main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--t5-large
+
+    >>> print(get_table(hf_cache_info, verbosity=1))
+    REPO ID                                             REPO TYPE REVISION                                 SIZE ON DISK NB FILES LAST_MODIFIED REFS LOCAL PATH
+    --------------------------------------------------- --------- ---------------------------------------- ------------ -------- ------------- ---- -----------------------------------------------------------------------------------------------------------------------------------------------------
+    roberta-base                                        model     e2da8e2f811d1448a5b465c236feacd80ffbac7b         2.7M        5 1 week ago    main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--roberta-base\\snapshots\\e2da8e2f811d1448a5b465c236feacd80ffbac7b
+    suno/bark                                           model     70a8a7d34168586dc5d028fa9666aceade177992         8.8K        1 1 week ago    main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--suno--bark\\snapshots\\70a8a7d34168586dc5d028fa9666aceade177992
+    t5-base                                             model     a9723ea7f1b39c1eae772870f3b547bf6ef7e6c1       893.8M        4 7 months ago  main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--t5-base\\snapshots\\a9723ea7f1b39c1eae772870f3b547bf6ef7e6c1
+    t5-large                                            model     150ebc2c4b72291e770f58e6057481c8d2ed331a         3.0G        4 5 months ago  main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--t5-large\\snapshots\\150ebc2c4b72291e770f58e6057481c8d2ed331a                                                 ```
+    ```
+
+    Args:
+        hf_cache_info ([`HFCacheInfo`]):
+            The HFCacheInfo object to print.
+        verbosity (`int`, *optional*):
+            The verbosity level. Defaults to 0.
+
+    Returns:
+        `str`: The table as a string.
+    """
+    if verbosity == 0:
+        return tabulate(
+            rows=[
+                [
+                    repo.repo_id,
+                    repo.repo_type,
+                    "{:>12}".format(repo.size_on_disk_str),
+                    repo.nb_files,
+                    repo.last_accessed_str,
+                    repo.last_modified_str,
+                    ", ".join(sorted(repo.refs)),
+                    str(repo.repo_path),
+                ]
+                for repo in sorted(hf_cache_info.repos, key=lambda repo: repo.repo_path)
+            ],
+            headers=[
+                "REPO ID",
+                "REPO TYPE",
+                "SIZE ON DISK",
+                "NB FILES",
+                "LAST_ACCESSED",
+                "LAST_MODIFIED",
+                "REFS",
+                "LOCAL PATH",
+            ],
+        )
+    else:
+        return tabulate(
+            rows=[
+                [
+                    repo.repo_id,
+                    repo.repo_type,
+                    revision.commit_hash,
+                    "{:>12}".format(revision.size_on_disk_str),
+                    revision.nb_files,
+                    revision.last_modified_str,
+                    ", ".join(sorted(revision.refs)),
+                    str(revision.snapshot_path),
+                ]
+                for repo in sorted(hf_cache_info.repos, key=lambda repo: repo.repo_path)
+                for revision in sorted(repo.revisions, key=lambda revision: revision.commit_hash)
+            ],
+            headers=[
+                "REPO ID",
+                "REPO TYPE",
+                "REVISION",
+                "SIZE ON DISK",
+                "NB FILES",
+                "LAST_MODIFIED",
+                "REFS",
+                "LOCAL PATH",
+            ],
+        )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/tag.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/tag.py
new file mode 100644
index 0000000000000000000000000000000000000000..405d407f8135d940cf078f905a6e66acd4b1dacc
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/tag.py
@@ -0,0 +1,161 @@
+# coding=utf-8
+# Copyright 2024-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Contains commands to perform tag management with the CLI.
+
+Usage Examples:
+    - Create a tag:
+        $ huggingface-cli tag user/my-model 1.0 --message "First release"
+        $ huggingface-cli tag user/my-model 1.0 -m "First release" --revision develop
+        $ huggingface-cli tag user/my-dataset 1.0 -m "First release" --repo-type dataset
+        $ huggingface-cli tag user/my-space 1.0
+    - List all tags:
+        $ huggingface-cli tag -l user/my-model
+        $ huggingface-cli tag --list user/my-dataset --repo-type dataset
+    - Delete a tag:
+        $ huggingface-cli tag -d user/my-model 1.0
+        $ huggingface-cli tag --delete user/my-dataset 1.0 --repo-type dataset
+        $ huggingface-cli tag -d user/my-space 1.0 -y
+"""
+
+from argparse import Namespace, _SubParsersAction
+
+from requests.exceptions import HTTPError
+
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.constants import (
+    REPO_TYPES,
+)
+from huggingface_hub.hf_api import HfApi
+
+from ..errors import HfHubHTTPError, RepositoryNotFoundError, RevisionNotFoundError
+from ._cli_utils import ANSI, show_deprecation_warning
+
+
+class TagCommands(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        tag_parser = parser.add_parser("tag", help="(create, list, delete) tags for a repo in the hub")
+
+        tag_parser.add_argument("repo_id", type=str, help="The ID of the repo to tag (e.g. `username/repo-name`).")
+        tag_parser.add_argument("tag", nargs="?", type=str, help="The name of the tag for creation or deletion.")
+        tag_parser.add_argument("-m", "--message", type=str, help="The description of the tag to create.")
+        tag_parser.add_argument("--revision", type=str, help="The git revision to tag.")
+        tag_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens."
+        )
+        tag_parser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            default="model",
+            help="Set the type of repository (model, dataset, or space).",
+        )
+        tag_parser.add_argument("-y", "--yes", action="store_true", help="Answer Yes to prompts automatically.")
+
+        tag_parser.add_argument("-l", "--list", action="store_true", help="List tags for a repository.")
+        tag_parser.add_argument("-d", "--delete", action="store_true", help="Delete a tag for a repository.")
+
+        tag_parser.set_defaults(func=lambda args: handle_commands(args))
+
+
+def handle_commands(args: Namespace):
+    show_deprecation_warning("huggingface-cli tag", "hf repo tag")
+
+    if args.list:
+        return TagListCommand(args)
+    elif args.delete:
+        return TagDeleteCommand(args)
+    else:
+        return TagCreateCommand(args)
+
+
+class TagCommand:
+    def __init__(self, args: Namespace):
+        self.args = args
+        self.api = HfApi(token=self.args.token)
+        self.repo_id = self.args.repo_id
+        self.repo_type = self.args.repo_type
+        if self.repo_type not in REPO_TYPES:
+            print("Invalid repo --repo-type")
+            exit(1)
+
+
+class TagCreateCommand(TagCommand):
+    def run(self):
+        print(f"You are about to create tag {ANSI.bold(self.args.tag)} on {self.repo_type} {ANSI.bold(self.repo_id)}")
+
+        try:
+            self.api.create_tag(
+                repo_id=self.repo_id,
+                tag=self.args.tag,
+                tag_message=self.args.message,
+                revision=self.args.revision,
+                repo_type=self.repo_type,
+            )
+        except RepositoryNotFoundError:
+            print(f"{self.repo_type.capitalize()} {ANSI.bold(self.repo_id)} not found.")
+            exit(1)
+        except RevisionNotFoundError:
+            print(f"Revision {ANSI.bold(self.args.revision)} not found.")
+            exit(1)
+        except HfHubHTTPError as e:
+            if e.response.status_code == 409:
+                print(f"Tag {ANSI.bold(self.args.tag)} already exists on {ANSI.bold(self.repo_id)}")
+                exit(1)
+            raise e
+
+        print(f"Tag {ANSI.bold(self.args.tag)} created on {ANSI.bold(self.repo_id)}")
+
+
+class TagListCommand(TagCommand):
+    def run(self):
+        try:
+            refs = self.api.list_repo_refs(
+                repo_id=self.repo_id,
+                repo_type=self.repo_type,
+            )
+        except RepositoryNotFoundError:
+            print(f"{self.repo_type.capitalize()} {ANSI.bold(self.repo_id)} not found.")
+            exit(1)
+        except HTTPError as e:
+            print(e)
+            print(ANSI.red(e.response.text))
+            exit(1)
+        if len(refs.tags) == 0:
+            print("No tags found")
+            exit(0)
+        print(f"Tags for {self.repo_type} {ANSI.bold(self.repo_id)}:")
+        for tag in refs.tags:
+            print(tag.name)
+
+
+class TagDeleteCommand(TagCommand):
+    def run(self):
+        print(f"You are about to delete tag {ANSI.bold(self.args.tag)} on {self.repo_type} {ANSI.bold(self.repo_id)}")
+
+        if not self.args.yes:
+            choice = input("Proceed? [Y/n] ").lower()
+            if choice not in ("", "y", "yes"):
+                print("Abort")
+                exit()
+        try:
+            self.api.delete_tag(repo_id=self.repo_id, tag=self.args.tag, repo_type=self.repo_type)
+        except RepositoryNotFoundError:
+            print(f"{self.repo_type.capitalize()} {ANSI.bold(self.repo_id)} not found.")
+            exit(1)
+        except RevisionNotFoundError:
+            print(f"Tag {ANSI.bold(self.args.tag)} not found on {ANSI.bold(self.repo_id)}")
+            exit(1)
+        print(f"Tag {ANSI.bold(self.args.tag)} deleted on {ANSI.bold(self.repo_id)}")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/upload.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/upload.py
new file mode 100644
index 0000000000000000000000000000000000000000..c778555cda56eb17c905f0728fef6712acc75cb8
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/upload.py
@@ -0,0 +1,318 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to upload a repo or file with the CLI.
+
+Usage:
+    # Upload file (implicit)
+    huggingface-cli upload my-cool-model ./my-cool-model.safetensors
+
+    # Upload file (explicit)
+    huggingface-cli upload my-cool-model ./my-cool-model.safetensors  model.safetensors
+
+    # Upload directory (implicit). If `my-cool-model/` is a directory it will be uploaded, otherwise an exception is raised.
+    huggingface-cli upload my-cool-model
+
+    # Upload directory (explicit)
+    huggingface-cli upload my-cool-model ./models/my-cool-model .
+
+    # Upload filtered directory (example: tensorboard logs except for the last run)
+    huggingface-cli upload my-cool-model ./model/training /logs --include "*.tfevents.*" --exclude "*20230905*"
+
+    # Upload with wildcard
+    huggingface-cli upload my-cool-model "./model/training/*.safetensors"
+
+    # Upload private dataset
+    huggingface-cli upload Wauplin/my-cool-dataset ./data . --repo-type=dataset --private
+
+    # Upload with token
+    huggingface-cli upload Wauplin/my-cool-model --token=hf_****
+
+    # Sync local Space with Hub (upload new files, delete removed files)
+    huggingface-cli upload Wauplin/space-example --repo-type=space --exclude="/logs/*" --delete="*" --commit-message="Sync local Space with Hub"
+
+    # Schedule commits every 30 minutes
+    huggingface-cli upload Wauplin/my-cool-model --every=30
+"""
+
+import os
+import time
+import warnings
+from argparse import Namespace, _SubParsersAction
+from typing import List, Optional
+
+from huggingface_hub import logging
+from huggingface_hub._commit_scheduler import CommitScheduler
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.constants import HF_HUB_ENABLE_HF_TRANSFER
+from huggingface_hub.errors import RevisionNotFoundError
+from huggingface_hub.hf_api import HfApi
+from huggingface_hub.utils import disable_progress_bars, enable_progress_bars
+from huggingface_hub.utils._runtime import is_xet_available
+
+from ._cli_utils import show_deprecation_warning
+
+
+logger = logging.get_logger(__name__)
+
+
+class UploadCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        upload_parser = parser.add_parser("upload", help="Upload a file or a folder to a repo on the Hub")
+        upload_parser.add_argument(
+            "repo_id", type=str, help="The ID of the repo to upload to (e.g. `username/repo-name`)."
+        )
+        upload_parser.add_argument(
+            "local_path",
+            nargs="?",
+            help="Local path to the file or folder to upload. Wildcard patterns are supported. Defaults to current directory.",
+        )
+        upload_parser.add_argument(
+            "path_in_repo",
+            nargs="?",
+            help="Path of the file or folder in the repo. Defaults to the relative path of the file or folder.",
+        )
+        upload_parser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            default="model",
+            help="Type of the repo to upload to (e.g. `dataset`).",
+        )
+        upload_parser.add_argument(
+            "--revision",
+            type=str,
+            help=(
+                "An optional Git revision to push to. It can be a branch name or a PR reference. If revision does not"
+                " exist and `--create-pr` is not set, a branch will be automatically created."
+            ),
+        )
+        upload_parser.add_argument(
+            "--private",
+            action="store_true",
+            help=(
+                "Whether to create a private repo if repo doesn't exist on the Hub. Ignored if the repo already"
+                " exists."
+            ),
+        )
+        upload_parser.add_argument("--include", nargs="*", type=str, help="Glob patterns to match files to upload.")
+        upload_parser.add_argument(
+            "--exclude", nargs="*", type=str, help="Glob patterns to exclude from files to upload."
+        )
+        upload_parser.add_argument(
+            "--delete",
+            nargs="*",
+            type=str,
+            help="Glob patterns for file to be deleted from the repo while committing.",
+        )
+        upload_parser.add_argument(
+            "--commit-message", type=str, help="The summary / title / first line of the generated commit."
+        )
+        upload_parser.add_argument("--commit-description", type=str, help="The description of the generated commit.")
+        upload_parser.add_argument(
+            "--create-pr", action="store_true", help="Whether to upload content as a new Pull Request."
+        )
+        upload_parser.add_argument(
+            "--every",
+            type=float,
+            help="If set, a background job is scheduled to create commits every `every` minutes.",
+        )
+        upload_parser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        upload_parser.add_argument(
+            "--quiet",
+            action="store_true",
+            help="If True, progress bars are disabled and only the path to the uploaded files is printed.",
+        )
+        upload_parser.set_defaults(func=UploadCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.repo_id: str = args.repo_id
+        self.repo_type: Optional[str] = args.repo_type
+        self.revision: Optional[str] = args.revision
+        self.private: bool = args.private
+
+        self.include: Optional[List[str]] = args.include
+        self.exclude: Optional[List[str]] = args.exclude
+        self.delete: Optional[List[str]] = args.delete
+
+        self.commit_message: Optional[str] = args.commit_message
+        self.commit_description: Optional[str] = args.commit_description
+        self.create_pr: bool = args.create_pr
+        self.api: HfApi = HfApi(token=args.token, library_name="huggingface-cli")
+        self.quiet: bool = args.quiet  # disable warnings and progress bars
+
+        # Check `--every` is valid
+        if args.every is not None and args.every <= 0:
+            raise ValueError(f"`every` must be a positive value (got '{args.every}')")
+        self.every: Optional[float] = args.every
+
+        # Resolve `local_path` and `path_in_repo`
+        repo_name: str = args.repo_id.split("/")[-1]  # e.g. "Wauplin/my-cool-model" => "my-cool-model"
+        self.local_path: str
+        self.path_in_repo: str
+
+        if args.local_path is not None and any(c in args.local_path for c in ["*", "?", "["]):
+            if args.include is not None:
+                raise ValueError("Cannot set `--include` when passing a `local_path` containing a wildcard.")
+            if args.path_in_repo is not None and args.path_in_repo != ".":
+                raise ValueError("Cannot set `path_in_repo` when passing a `local_path` containing a wildcard.")
+            self.local_path = "."
+            self.include = args.local_path
+            self.path_in_repo = "."
+        elif args.local_path is None and os.path.isfile(repo_name):
+            # Implicit case 1: user provided only a repo_id which happen to be a local file as well => upload it with same name
+            self.local_path = repo_name
+            self.path_in_repo = repo_name
+        elif args.local_path is None and os.path.isdir(repo_name):
+            # Implicit case 2: user provided only a repo_id which happen to be a local folder as well => upload it at root
+            self.local_path = repo_name
+            self.path_in_repo = "."
+        elif args.local_path is None:
+            # Implicit case 3: user provided only a repo_id that does not match a local file or folder
+            # => the user must explicitly provide a local_path => raise exception
+            raise ValueError(f"'{repo_name}' is not a local file or folder. Please set `local_path` explicitly.")
+        elif args.path_in_repo is None and os.path.isfile(args.local_path):
+            # Explicit local path to file, no path in repo => upload it at root with same name
+            self.local_path = args.local_path
+            self.path_in_repo = os.path.basename(args.local_path)
+        elif args.path_in_repo is None:
+            # Explicit local path to folder, no path in repo => upload at root
+            self.local_path = args.local_path
+            self.path_in_repo = "."
+        else:
+            # Finally, if both paths are explicit
+            self.local_path = args.local_path
+            self.path_in_repo = args.path_in_repo
+
+    def run(self) -> None:
+        show_deprecation_warning("huggingface-cli upload", "hf upload")
+
+        if self.quiet:
+            disable_progress_bars()
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore")
+                print(self._upload())
+            enable_progress_bars()
+        else:
+            logging.set_verbosity_info()
+            print(self._upload())
+            logging.set_verbosity_warning()
+
+    def _upload(self) -> str:
+        if os.path.isfile(self.local_path):
+            if self.include is not None and len(self.include) > 0:
+                warnings.warn("Ignoring `--include` since a single file is uploaded.")
+            if self.exclude is not None and len(self.exclude) > 0:
+                warnings.warn("Ignoring `--exclude` since a single file is uploaded.")
+            if self.delete is not None and len(self.delete) > 0:
+                warnings.warn("Ignoring `--delete` since a single file is uploaded.")
+
+        if not is_xet_available() and not HF_HUB_ENABLE_HF_TRANSFER:
+            logger.info(
+                "Consider using `hf_transfer` for faster uploads. This solution comes with some limitations. See"
+                " https://huggingface.co/docs/huggingface_hub/hf_transfer for more details."
+            )
+
+        # Schedule commits if `every` is set
+        if self.every is not None:
+            if os.path.isfile(self.local_path):
+                # If file => watch entire folder + use allow_patterns
+                folder_path = os.path.dirname(self.local_path)
+                path_in_repo = (
+                    self.path_in_repo[: -len(self.local_path)]  # remove filename from path_in_repo
+                    if self.path_in_repo.endswith(self.local_path)
+                    else self.path_in_repo
+                )
+                allow_patterns = [self.local_path]
+                ignore_patterns = []
+            else:
+                folder_path = self.local_path
+                path_in_repo = self.path_in_repo
+                allow_patterns = self.include or []
+                ignore_patterns = self.exclude or []
+                if self.delete is not None and len(self.delete) > 0:
+                    warnings.warn("Ignoring `--delete` when uploading with scheduled commits.")
+
+            scheduler = CommitScheduler(
+                folder_path=folder_path,
+                repo_id=self.repo_id,
+                repo_type=self.repo_type,
+                revision=self.revision,
+                allow_patterns=allow_patterns,
+                ignore_patterns=ignore_patterns,
+                path_in_repo=path_in_repo,
+                private=self.private,
+                every=self.every,
+                hf_api=self.api,
+            )
+            print(f"Scheduling commits every {self.every} minutes to {scheduler.repo_id}.")
+            try:  # Block main thread until KeyboardInterrupt
+                while True:
+                    time.sleep(100)
+            except KeyboardInterrupt:
+                scheduler.stop()
+                return "Stopped scheduled commits."
+
+        # Otherwise, create repo and proceed with the upload
+        if not os.path.isfile(self.local_path) and not os.path.isdir(self.local_path):
+            raise FileNotFoundError(f"No such file or directory: '{self.local_path}'.")
+        repo_id = self.api.create_repo(
+            repo_id=self.repo_id,
+            repo_type=self.repo_type,
+            exist_ok=True,
+            private=self.private,
+            space_sdk="gradio" if self.repo_type == "space" else None,
+            # ^ We don't want it to fail when uploading to a Space => let's set Gradio by default.
+            # ^ I'd rather not add CLI args to set it explicitly as we already have `huggingface-cli repo create` for that.
+        ).repo_id
+
+        # Check if branch already exists and if not, create it
+        if self.revision is not None and not self.create_pr:
+            try:
+                self.api.repo_info(repo_id=repo_id, repo_type=self.repo_type, revision=self.revision)
+            except RevisionNotFoundError:
+                logger.info(f"Branch '{self.revision}' not found. Creating it...")
+                self.api.create_branch(repo_id=repo_id, repo_type=self.repo_type, branch=self.revision, exist_ok=True)
+                # ^ `exist_ok=True` to avoid race concurrency issues
+
+        # File-based upload
+        if os.path.isfile(self.local_path):
+            return self.api.upload_file(
+                path_or_fileobj=self.local_path,
+                path_in_repo=self.path_in_repo,
+                repo_id=repo_id,
+                repo_type=self.repo_type,
+                revision=self.revision,
+                commit_message=self.commit_message,
+                commit_description=self.commit_description,
+                create_pr=self.create_pr,
+            )
+
+        # Folder-based upload
+        else:
+            return self.api.upload_folder(
+                folder_path=self.local_path,
+                path_in_repo=self.path_in_repo,
+                repo_id=repo_id,
+                repo_type=self.repo_type,
+                revision=self.revision,
+                commit_message=self.commit_message,
+                commit_description=self.commit_description,
+                create_pr=self.create_pr,
+                allow_patterns=self.include,
+                ignore_patterns=self.exclude,
+                delete_patterns=self.delete,
+            )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/upload_large_folder.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/upload_large_folder.py
new file mode 100644
index 0000000000000000000000000000000000000000..3105ba3f57f5644aa18e627aa5d1d18e61515ae7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/upload_large_folder.py
@@ -0,0 +1,131 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to upload a large folder with the CLI."""
+
+import os
+from argparse import Namespace, _SubParsersAction
+from typing import List, Optional
+
+from huggingface_hub import logging
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.hf_api import HfApi
+from huggingface_hub.utils import disable_progress_bars
+
+from ._cli_utils import ANSI, show_deprecation_warning
+
+
+logger = logging.get_logger(__name__)
+
+
+class UploadLargeFolderCommand(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        subparser = parser.add_parser("upload-large-folder", help="Upload a large folder to a repo on the Hub")
+        subparser.add_argument(
+            "repo_id", type=str, help="The ID of the repo to upload to (e.g. `username/repo-name`)."
+        )
+        subparser.add_argument("local_path", type=str, help="Local path to the file or folder to upload.")
+        subparser.add_argument(
+            "--repo-type",
+            choices=["model", "dataset", "space"],
+            help="Type of the repo to upload to (e.g. `dataset`).",
+        )
+        subparser.add_argument(
+            "--revision",
+            type=str,
+            help=("An optional Git revision to push to. It can be a branch name or a PR reference."),
+        )
+        subparser.add_argument(
+            "--private",
+            action="store_true",
+            help=(
+                "Whether to create a private repo if repo doesn't exist on the Hub. Ignored if the repo already exists."
+            ),
+        )
+        subparser.add_argument("--include", nargs="*", type=str, help="Glob patterns to match files to upload.")
+        subparser.add_argument("--exclude", nargs="*", type=str, help="Glob patterns to exclude from files to upload.")
+        subparser.add_argument(
+            "--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
+        )
+        subparser.add_argument(
+            "--num-workers", type=int, help="Number of workers to use to hash, upload and commit files."
+        )
+        subparser.add_argument("--no-report", action="store_true", help="Whether to disable regular status report.")
+        subparser.add_argument("--no-bars", action="store_true", help="Whether to disable progress bars.")
+        subparser.set_defaults(func=UploadLargeFolderCommand)
+
+    def __init__(self, args: Namespace) -> None:
+        self.repo_id: str = args.repo_id
+        self.local_path: str = args.local_path
+        self.repo_type: str = args.repo_type
+        self.revision: Optional[str] = args.revision
+        self.private: bool = args.private
+
+        self.include: Optional[List[str]] = args.include
+        self.exclude: Optional[List[str]] = args.exclude
+
+        self.api: HfApi = HfApi(token=args.token, library_name="huggingface-cli")
+
+        self.num_workers: Optional[int] = args.num_workers
+        self.no_report: bool = args.no_report
+        self.no_bars: bool = args.no_bars
+
+        if not os.path.isdir(self.local_path):
+            raise ValueError("Large upload is only supported for folders.")
+
+    def run(self) -> None:
+        show_deprecation_warning("huggingface-cli upload-large-folder", "hf upload-large-folder")
+
+        logging.set_verbosity_info()
+
+        print(
+            ANSI.yellow(
+                "You are about to upload a large folder to the Hub using `huggingface-cli upload-large-folder`. "
+                "This is a new feature so feedback is very welcome!\n"
+                "\n"
+                "A few things to keep in mind:\n"
+                "  - Repository limits still apply: https://huggingface.co/docs/hub/repositories-recommendations\n"
+                "  - Do not start several processes in parallel.\n"
+                "  - You can interrupt and resume the process at any time. "
+                "The script will pick up where it left off except for partially uploaded files that would have to be entirely reuploaded.\n"
+                "  - Do not upload the same folder to several repositories. If you need to do so, you must delete the `./.cache/huggingface/` folder first.\n"
+                "\n"
+                f"Some temporary metadata will be stored under `{self.local_path}/.cache/huggingface`.\n"
+                "  - You must not modify those files manually.\n"
+                "  - You must not delete the `./.cache/huggingface/` folder while a process is running.\n"
+                "  - You can delete the `./.cache/huggingface/` folder to reinitialize the upload state when process is not running. Files will have to be hashed and preuploaded again, except for already committed files.\n"
+                "\n"
+                "If the process output is too verbose, you can disable the progress bars with `--no-bars`. "
+                "You can also entirely disable the status report with `--no-report`.\n"
+                "\n"
+                "For more details, run `huggingface-cli upload-large-folder --help` or check the documentation at "
+                "https://huggingface.co/docs/huggingface_hub/guides/upload#upload-a-large-folder."
+            )
+        )
+
+        if self.no_bars:
+            disable_progress_bars()
+
+        self.api.upload_large_folder(
+            repo_id=self.repo_id,
+            folder_path=self.local_path,
+            repo_type=self.repo_type,
+            revision=self.revision,
+            private=self.private,
+            allow_patterns=self.include,
+            ignore_patterns=self.exclude,
+            num_workers=self.num_workers,
+            print_report=not self.no_report,
+        )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/user.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/user.py
new file mode 100644
index 0000000000000000000000000000000000000000..3f4da0f45d0dae5bc4458f844f776db9c3971208
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/user.py
@@ -0,0 +1,208 @@
+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains commands to authenticate to the Hugging Face Hub and interact with your repositories.
+
+Usage:
+    # login and save token locally.
+    huggingface-cli login --token=hf_*** --add-to-git-credential
+
+    # switch between tokens
+    huggingface-cli auth switch
+
+    # list all tokens
+    huggingface-cli auth list
+
+    # logout from a specific token, if no token-name is provided, all tokens will be deleted from your machine.
+    huggingface-cli logout --token-name=your_token_name
+
+    # find out which huggingface.co account you are logged in as
+    huggingface-cli whoami
+"""
+
+from argparse import _SubParsersAction
+from typing import List, Optional
+
+from requests.exceptions import HTTPError
+
+from huggingface_hub.commands import BaseHuggingfaceCLICommand
+from huggingface_hub.constants import ENDPOINT
+from huggingface_hub.hf_api import HfApi
+
+from .._login import auth_list, auth_switch, login, logout
+from ..utils import get_stored_tokens, get_token, logging
+from ._cli_utils import ANSI, show_deprecation_warning
+
+
+logger = logging.get_logger(__name__)
+
+try:
+    from InquirerPy import inquirer
+    from InquirerPy.base.control import Choice
+
+    _inquirer_py_available = True
+except ImportError:
+    _inquirer_py_available = False
+
+
+class UserCommands(BaseHuggingfaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        login_parser = parser.add_parser("login", help="Log in using a token from huggingface.co/settings/tokens")
+        login_parser.add_argument(
+            "--token",
+            type=str,
+            help="Token generated from https://huggingface.co/settings/tokens",
+        )
+        login_parser.add_argument(
+            "--add-to-git-credential",
+            action="store_true",
+            help="Optional: Save token to git credential helper.",
+        )
+        login_parser.set_defaults(func=lambda args: LoginCommand(args))
+        whoami_parser = parser.add_parser("whoami", help="Find out which huggingface.co account you are logged in as.")
+        whoami_parser.set_defaults(func=lambda args: WhoamiCommand(args))
+
+        logout_parser = parser.add_parser("logout", help="Log out")
+        logout_parser.add_argument(
+            "--token-name",
+            type=str,
+            help="Optional: Name of the access token to log out from.",
+        )
+        logout_parser.set_defaults(func=lambda args: LogoutCommand(args))
+
+        auth_parser = parser.add_parser("auth", help="Other authentication related commands")
+        auth_subparsers = auth_parser.add_subparsers(help="Authentication subcommands")
+        auth_switch_parser = auth_subparsers.add_parser("switch", help="Switch between access tokens")
+        auth_switch_parser.add_argument(
+            "--token-name",
+            type=str,
+            help="Optional: Name of the access token to switch to.",
+        )
+        auth_switch_parser.add_argument(
+            "--add-to-git-credential",
+            action="store_true",
+            help="Optional: Save token to git credential helper.",
+        )
+        auth_switch_parser.set_defaults(func=lambda args: AuthSwitchCommand(args))
+        auth_list_parser = auth_subparsers.add_parser("list", help="List all stored access tokens")
+        auth_list_parser.set_defaults(func=lambda args: AuthListCommand(args))
+
+
+class BaseUserCommand:
+    def __init__(self, args):
+        self.args = args
+        self._api = HfApi()
+
+
+class LoginCommand(BaseUserCommand):
+    def run(self):
+        show_deprecation_warning("huggingface-cli login", "hf auth login")
+
+        logging.set_verbosity_info()
+        login(
+            token=self.args.token,
+            add_to_git_credential=self.args.add_to_git_credential,
+        )
+
+
+class LogoutCommand(BaseUserCommand):
+    def run(self):
+        show_deprecation_warning("huggingface-cli logout", "hf auth logout")
+
+        logging.set_verbosity_info()
+        logout(token_name=self.args.token_name)
+
+
+class AuthSwitchCommand(BaseUserCommand):
+    def run(self):
+        show_deprecation_warning("huggingface-cli auth switch", "hf auth switch")
+
+        logging.set_verbosity_info()
+        token_name = self.args.token_name
+        if token_name is None:
+            token_name = self._select_token_name()
+
+        if token_name is None:
+            print("No token name provided. Aborting.")
+            exit()
+        auth_switch(token_name, add_to_git_credential=self.args.add_to_git_credential)
+
+    def _select_token_name(self) -> Optional[str]:
+        token_names = list(get_stored_tokens().keys())
+
+        if not token_names:
+            logger.error("No stored tokens found. Please login first.")
+            return None
+
+        if _inquirer_py_available:
+            return self._select_token_name_tui(token_names)
+        # if inquirer is not available, use a simpler terminal UI
+        print("Available stored tokens:")
+        for i, token_name in enumerate(token_names, 1):
+            print(f"{i}. {token_name}")
+        while True:
+            try:
+                choice = input("Enter the number of the token to switch to (or 'q' to quit): ")
+                if choice.lower() == "q":
+                    return None
+                index = int(choice) - 1
+                if 0 <= index < len(token_names):
+                    return token_names[index]
+                else:
+                    print("Invalid selection. Please try again.")
+            except ValueError:
+                print("Invalid input. Please enter a number or 'q' to quit.")
+
+    def _select_token_name_tui(self, token_names: List[str]) -> Optional[str]:
+        choices = [Choice(token_name, name=token_name) for token_name in token_names]
+        try:
+            return inquirer.select(
+                message="Select a token to switch to:",
+                choices=choices,
+                default=None,
+            ).execute()
+        except KeyboardInterrupt:
+            logger.info("Token selection cancelled.")
+            return None
+
+
+class AuthListCommand(BaseUserCommand):
+    def run(self):
+        show_deprecation_warning("huggingface-cli auth list", "hf auth list")
+
+        logging.set_verbosity_info()
+        auth_list()
+
+
+class WhoamiCommand(BaseUserCommand):
+    def run(self):
+        show_deprecation_warning("huggingface-cli whoami", "hf auth whoami")
+
+        token = get_token()
+        if token is None:
+            print("Not logged in")
+            exit()
+        try:
+            info = self._api.whoami(token)
+            print(info["name"])
+            orgs = [org["name"] for org in info["orgs"]]
+            if orgs:
+                print(ANSI.bold("orgs: "), ",".join(orgs))
+
+            if ENDPOINT != "https://huggingface.co":
+                print(f"Authenticated through private endpoint: {ENDPOINT}")
+        except HTTPError as e:
+            print(e)
+            print(ANSI.red(e.response.text))
+            exit(1)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/version.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/version.py
new file mode 100644
index 0000000000000000000000000000000000000000..10d341bcdb93e0616fcf80370ac8dde63b15ce9c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/commands/version.py
@@ -0,0 +1,40 @@
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains command to print information about the version.
+
+Usage:
+    huggingface-cli version
+"""
+
+from argparse import _SubParsersAction
+
+from huggingface_hub import __version__
+
+from . import BaseHuggingfaceCLICommand
+from ._cli_utils import show_deprecation_warning
+
+
+class VersionCommand(BaseHuggingfaceCLICommand):
+    def __init__(self, args):
+        self.args = args
+
+    @staticmethod
+    def register_subcommand(parser: _SubParsersAction):
+        version_parser = parser.add_parser("version", help="Print information about the huggingface-cli version.")
+        version_parser.set_defaults(func=VersionCommand)
+
+    def run(self) -> None:
+        show_deprecation_warning("huggingface-cli version", "hf version")
+
+        print(f"huggingface_hub version: {__version__}")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..574de5b0b5b177d853e91f08ffe6f40d2efd986a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/__pycache__/_common.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/__pycache__/_common.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..deb6fdbeb26c490bc9393bcb675a5772bde83cda
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/__pycache__/_common.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_client.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..f50e7d56eb9fd5b0362fb60819d97ee406ded94f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_client.py
@@ -0,0 +1,3368 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Related resources:
+#    https://huggingface.co/tasks
+#    https://huggingface.co/docs/huggingface.js/inference/README
+#    https://github.com/huggingface/huggingface.js/tree/main/packages/inference/src
+#    https://github.com/huggingface/text-generation-inference/tree/main/clients/python
+#    https://github.com/huggingface/text-generation-inference/blob/main/clients/python/text_generation/client.py
+#    https://huggingface.slack.com/archives/C03E4DQ9LAJ/p1680169099087869
+#    https://github.com/huggingface/unity-api#tasks
+#
+# Some TODO:
+# - add all tasks
+#
+# NOTE: the philosophy of this client is "let's make it as easy as possible to use it, even if less optimized". Some
+# examples of how it translates:
+# - Timeout / Server unavailable is handled by the client in a single "timeout" parameter.
+# - Files can be provided as bytes, file paths, or URLs and the client will try to "guess" the type.
+# - Images are parsed as PIL.Image for easier manipulation.
+# - Provides a "recommended model" for each task => suboptimal but user-wise quicker to get a first script running.
+# - Only the main parameters are publicly exposed. Power users can always read the docs for more options.
+import base64
+import logging
+import re
+import warnings
+from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Literal, Optional, Union, overload
+
+from requests import HTTPError
+
+from huggingface_hub import constants
+from huggingface_hub.errors import BadRequestError, InferenceTimeoutError
+from huggingface_hub.inference._common import (
+    TASKS_EXPECTING_IMAGES,
+    ContentT,
+    RequestParameters,
+    _b64_encode,
+    _b64_to_image,
+    _bytes_to_dict,
+    _bytes_to_image,
+    _bytes_to_list,
+    _get_unsupported_text_generation_kwargs,
+    _import_numpy,
+    _set_unsupported_text_generation_kwargs,
+    _stream_chat_completion_response,
+    _stream_text_generation_response,
+    raise_text_generation_error,
+)
+from huggingface_hub.inference._generated.types import (
+    AudioClassificationOutputElement,
+    AudioClassificationOutputTransform,
+    AudioToAudioOutputElement,
+    AutomaticSpeechRecognitionOutput,
+    ChatCompletionInputGrammarType,
+    ChatCompletionInputMessage,
+    ChatCompletionInputStreamOptions,
+    ChatCompletionInputTool,
+    ChatCompletionInputToolChoiceClass,
+    ChatCompletionInputToolChoiceEnum,
+    ChatCompletionOutput,
+    ChatCompletionStreamOutput,
+    DocumentQuestionAnsweringOutputElement,
+    FillMaskOutputElement,
+    ImageClassificationOutputElement,
+    ImageClassificationOutputTransform,
+    ImageSegmentationOutputElement,
+    ImageSegmentationSubtask,
+    ImageToImageTargetSize,
+    ImageToTextOutput,
+    ImageToVideoTargetSize,
+    ObjectDetectionOutputElement,
+    Padding,
+    QuestionAnsweringOutputElement,
+    SummarizationOutput,
+    SummarizationTruncationStrategy,
+    TableQuestionAnsweringOutputElement,
+    TextClassificationOutputElement,
+    TextClassificationOutputTransform,
+    TextGenerationInputGrammarType,
+    TextGenerationOutput,
+    TextGenerationStreamOutput,
+    TextToSpeechEarlyStoppingEnum,
+    TokenClassificationAggregationStrategy,
+    TokenClassificationOutputElement,
+    TranslationOutput,
+    TranslationTruncationStrategy,
+    VisualQuestionAnsweringOutputElement,
+    ZeroShotClassificationOutputElement,
+    ZeroShotImageClassificationOutputElement,
+)
+from huggingface_hub.inference._providers import PROVIDER_OR_POLICY_T, get_provider_helper
+from huggingface_hub.utils import build_hf_headers, get_session, hf_raise_for_status
+from huggingface_hub.utils._auth import get_token
+
+
+if TYPE_CHECKING:
+    import numpy as np
+    from PIL.Image import Image
+
+logger = logging.getLogger(__name__)
+
+
+MODEL_KWARGS_NOT_USED_REGEX = re.compile(r"The following `model_kwargs` are not used by the model: \[(.*?)\]")
+
+
+class InferenceClient:
+    """
+    Initialize a new Inference Client.
+
+    [`InferenceClient`] aims to provide a unified experience to perform inference. The client can be used
+    seamlessly with either the (free) Inference API, self-hosted Inference Endpoints, or third-party Inference Providers.
+
+    Args:
+        model (`str`, `optional`):
+            The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. `meta-llama/Meta-Llama-3-8B-Instruct`
+            or a URL to a deployed Inference Endpoint. Defaults to None, in which case a recommended model is
+            automatically selected for the task.
+            Note: for better compatibility with OpenAI's client, `model` has been aliased as `base_url`. Those 2
+            arguments are mutually exclusive. If a URL is passed as `model` or `base_url` for chat completion, the `(/v1)/chat/completions` suffix path will be appended to the URL.
+        provider (`str`, *optional*):
+            Name of the provider to use for inference. Can be `"black-forest-labs"`, `"cerebras"`, `"clarifai"`, `"cohere"`, `"fal-ai"`, `"featherless-ai"`, `"fireworks-ai"`, `"groq"`, `"hf-inference"`, `"hyperbolic"`, `"nebius"`, `"novita"`, `"nscale"`, `"openai"`, `publicai`, `"replicate"`, `"sambanova"`, `"scaleway"`, `"together"` or `"zai-org"`.
+            Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
+            If model is a URL or `base_url` is passed, then `provider` is not used.
+        token (`str`, *optional*):
+            Hugging Face token. Will default to the locally saved token if not provided.
+            Note: for better compatibility with OpenAI's client, `token` has been aliased as `api_key`. Those 2
+            arguments are mutually exclusive and have the exact same behavior.
+        timeout (`float`, `optional`):
+            The maximum number of seconds to wait for a response from the server. Defaults to None, meaning it will loop until the server is available.
+        headers (`Dict[str, str]`, `optional`):
+            Additional headers to send to the server. By default only the authorization and user-agent headers are sent.
+            Values in this dictionary will override the default values.
+        bill_to (`str`, `optional`):
+            The billing account to use for the requests. By default the requests are billed on the user's account.
+            Requests can only be billed to an organization the user is a member of, and which has subscribed to Enterprise Hub.
+        cookies (`Dict[str, str]`, `optional`):
+            Additional cookies to send to the server.
+        proxies (`Any`, `optional`):
+            Proxies to use for the request.
+        base_url (`str`, `optional`):
+            Base URL to run inference. This is a duplicated argument from `model` to make [`InferenceClient`]
+            follow the same pattern as `openai.OpenAI` client. Cannot be used if `model` is set. Defaults to None.
+        api_key (`str`, `optional`):
+            Token to use for authentication. This is a duplicated argument from `token` to make [`InferenceClient`]
+            follow the same pattern as `openai.OpenAI` client. Cannot be used if `token` is set. Defaults to None.
+    """
+
+    def __init__(
+        self,
+        model: Optional[str] = None,
+        *,
+        provider: Optional[PROVIDER_OR_POLICY_T] = None,
+        token: Optional[str] = None,
+        timeout: Optional[float] = None,
+        headers: Optional[Dict[str, str]] = None,
+        cookies: Optional[Dict[str, str]] = None,
+        proxies: Optional[Any] = None,
+        bill_to: Optional[str] = None,
+        # OpenAI compatibility
+        base_url: Optional[str] = None,
+        api_key: Optional[str] = None,
+    ) -> None:
+        if model is not None and base_url is not None:
+            raise ValueError(
+                "Received both `model` and `base_url` arguments. Please provide only one of them."
+                " `base_url` is an alias for `model` to make the API compatible with OpenAI's client."
+                " If using `base_url` for chat completion, the `/chat/completions` suffix path will be appended to the base url."
+                " When passing a URL as `model`, the client will not append any suffix path to it."
+            )
+        if token is not None and api_key is not None:
+            raise ValueError(
+                "Received both `token` and `api_key` arguments. Please provide only one of them."
+                " `api_key` is an alias for `token` to make the API compatible with OpenAI's client."
+                " It has the exact same behavior as `token`."
+            )
+        token = token if token is not None else api_key
+        if isinstance(token, bool):
+            # Legacy behavior: previously is was possible to pass `token=False` to disable authentication. This is not
+            # supported anymore as authentication is required. Better to explicitly raise here rather than risking
+            # sending the locally saved token without the user knowing about it.
+            if token is False:
+                raise ValueError(
+                    "Cannot use `token=False` to disable authentication as authentication is required to run Inference."
+                )
+            warnings.warn(
+                "Using `token=True` to automatically use the locally saved token is deprecated and will be removed in a future release. "
+                "Please use `token=None` instead (default).",
+                DeprecationWarning,
+            )
+            token = get_token()
+
+        self.model: Optional[str] = base_url or model
+        self.token: Optional[str] = token
+
+        self.headers = {**headers} if headers is not None else {}
+        if bill_to is not None:
+            if (
+                constants.HUGGINGFACE_HEADER_X_BILL_TO in self.headers
+                and self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO] != bill_to
+            ):
+                warnings.warn(
+                    f"Overriding existing '{self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO]}' value in headers with '{bill_to}'.",
+                    UserWarning,
+                )
+            self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO] = bill_to
+
+            if token is not None and not token.startswith("hf_"):
+                warnings.warn(
+                    "You've provided an external provider's API key, so requests will be billed directly by the provider. "
+                    "The `bill_to` parameter is only applicable for Hugging Face billing and will be ignored.",
+                    UserWarning,
+                )
+
+        # Configure provider
+        self.provider = provider
+
+        self.cookies = cookies
+        self.timeout = timeout
+        self.proxies = proxies
+
+    def __repr__(self):
+        return f"<InferenceClient(model='{self.model if self.model else ''}', timeout={self.timeout})>"
+
+    @overload
+    def _inner_post(  # type: ignore[misc]
+        self, request_parameters: RequestParameters, *, stream: Literal[False] = ...
+    ) -> bytes: ...
+
+    @overload
+    def _inner_post(  # type: ignore[misc]
+        self, request_parameters: RequestParameters, *, stream: Literal[True] = ...
+    ) -> Iterable[bytes]: ...
+
+    @overload
+    def _inner_post(
+        self, request_parameters: RequestParameters, *, stream: bool = False
+    ) -> Union[bytes, Iterable[bytes]]: ...
+
+    def _inner_post(
+        self, request_parameters: RequestParameters, *, stream: bool = False
+    ) -> Union[bytes, Iterable[bytes]]:
+        """Make a request to the inference server."""
+        # TODO: this should be handled in provider helpers directly
+        if request_parameters.task in TASKS_EXPECTING_IMAGES and "Accept" not in request_parameters.headers:
+            request_parameters.headers["Accept"] = "image/png"
+
+        try:
+            response = get_session().post(
+                request_parameters.url,
+                json=request_parameters.json,
+                data=request_parameters.data,
+                headers=request_parameters.headers,
+                cookies=self.cookies,
+                timeout=self.timeout,
+                stream=stream,
+                proxies=self.proxies,
+            )
+        except TimeoutError as error:
+            # Convert any `TimeoutError` to a `InferenceTimeoutError`
+            raise InferenceTimeoutError(f"Inference call timed out: {request_parameters.url}") from error  # type: ignore
+
+        try:
+            hf_raise_for_status(response)
+            return response.iter_lines() if stream else response.content
+        except HTTPError as error:
+            if error.response.status_code == 422 and request_parameters.task != "unknown":
+                msg = str(error.args[0])
+                if len(error.response.text) > 0:
+                    msg += f"\n{error.response.text}\n"
+                error.args = (msg,) + error.args[1:]
+            raise
+
+    def audio_classification(
+        self,
+        audio: ContentT,
+        *,
+        model: Optional[str] = None,
+        top_k: Optional[int] = None,
+        function_to_apply: Optional["AudioClassificationOutputTransform"] = None,
+    ) -> List[AudioClassificationOutputElement]:
+        """
+        Perform audio classification on the provided audio content.
+
+        Args:
+            audio (Union[str, Path, bytes, BinaryIO]):
+                The audio content to classify. It can be raw audio bytes, a local audio file, or a URL pointing to an
+                audio file.
+            model (`str`, *optional*):
+                The model to use for audio classification. Can be a model ID hosted on the Hugging Face Hub
+                or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for
+                audio classification will be used.
+            top_k (`int`, *optional*):
+                When specified, limits the output to the top K most probable classes.
+            function_to_apply (`"AudioClassificationOutputTransform"`, *optional*):
+                The function to apply to the model outputs in order to retrieve the scores.
+
+        Returns:
+            `List[AudioClassificationOutputElement]`: List of [`AudioClassificationOutputElement`] items containing the predicted labels and their confidence.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.audio_classification("audio.flac")
+        [
+            AudioClassificationOutputElement(score=0.4976358711719513, label='hap'),
+            AudioClassificationOutputElement(score=0.3677836060523987, label='neu'),
+            ...
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="audio-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=audio,
+            parameters={"function_to_apply": function_to_apply, "top_k": top_k},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return AudioClassificationOutputElement.parse_obj_as_list(response)
+
+    def audio_to_audio(
+        self,
+        audio: ContentT,
+        *,
+        model: Optional[str] = None,
+    ) -> List[AudioToAudioOutputElement]:
+        """
+        Performs multiple tasks related to audio-to-audio depending on the model (eg: speech enhancement, source separation).
+
+        Args:
+            audio (Union[str, Path, bytes, BinaryIO]):
+                The audio content for the model. It can be raw audio bytes, a local audio file, or a URL pointing to an
+                audio file.
+            model (`str`, *optional*):
+                The model can be any model which takes an audio file and returns another audio file. Can be a model ID hosted on the Hugging Face Hub
+                or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for
+                audio_to_audio will be used.
+
+        Returns:
+            `List[AudioToAudioOutputElement]`: A list of [`AudioToAudioOutputElement`] items containing audios label, content-type, and audio content in blob.
+
+        Raises:
+            `InferenceTimeoutError`:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> audio_output = client.audio_to_audio("audio.flac")
+        >>> for i, item in enumerate(audio_output):
+        >>>     with open(f"output_{i}.flac", "wb") as f:
+                    f.write(item.blob)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="audio-to-audio", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=audio,
+            parameters={},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        audio_output = AudioToAudioOutputElement.parse_obj_as_list(response)
+        for item in audio_output:
+            item.blob = base64.b64decode(item.blob)
+        return audio_output
+
+    def automatic_speech_recognition(
+        self,
+        audio: ContentT,
+        *,
+        model: Optional[str] = None,
+        extra_body: Optional[Dict] = None,
+    ) -> AutomaticSpeechRecognitionOutput:
+        """
+        Perform automatic speech recognition (ASR or audio-to-text) on the given audio content.
+
+        Args:
+            audio (Union[str, Path, bytes, BinaryIO]):
+                The content to transcribe. It can be raw audio bytes, local audio file, or a URL to an audio file.
+            model (`str`, *optional*):
+                The model to use for ASR. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended model for ASR will be used.
+            extra_body (`Dict`, *optional*):
+                Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
+                for supported parameters.
+        Returns:
+            [`AutomaticSpeechRecognitionOutput`]: An item containing the transcribed text and optionally the timestamp chunks.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.automatic_speech_recognition("hello_world.flac").text
+        "hello world"
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="automatic-speech-recognition", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=audio,
+            parameters={**(extra_body or {})},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return AutomaticSpeechRecognitionOutput.parse_obj_as_instance(response)
+
+    @overload
+    def chat_completion(  # type: ignore
+        self,
+        messages: List[Union[Dict, ChatCompletionInputMessage]],
+        *,
+        model: Optional[str] = None,
+        stream: Literal[False] = False,
+        frequency_penalty: Optional[float] = None,
+        logit_bias: Optional[List[float]] = None,
+        logprobs: Optional[bool] = None,
+        max_tokens: Optional[int] = None,
+        n: Optional[int] = None,
+        presence_penalty: Optional[float] = None,
+        response_format: Optional[ChatCompletionInputGrammarType] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stream_options: Optional[ChatCompletionInputStreamOptions] = None,
+        temperature: Optional[float] = None,
+        tool_choice: Optional[Union[ChatCompletionInputToolChoiceClass, "ChatCompletionInputToolChoiceEnum"]] = None,
+        tool_prompt: Optional[str] = None,
+        tools: Optional[List[ChatCompletionInputTool]] = None,
+        top_logprobs: Optional[int] = None,
+        top_p: Optional[float] = None,
+        extra_body: Optional[Dict] = None,
+    ) -> ChatCompletionOutput: ...
+
+    @overload
+    def chat_completion(  # type: ignore
+        self,
+        messages: List[Union[Dict, ChatCompletionInputMessage]],
+        *,
+        model: Optional[str] = None,
+        stream: Literal[True] = True,
+        frequency_penalty: Optional[float] = None,
+        logit_bias: Optional[List[float]] = None,
+        logprobs: Optional[bool] = None,
+        max_tokens: Optional[int] = None,
+        n: Optional[int] = None,
+        presence_penalty: Optional[float] = None,
+        response_format: Optional[ChatCompletionInputGrammarType] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stream_options: Optional[ChatCompletionInputStreamOptions] = None,
+        temperature: Optional[float] = None,
+        tool_choice: Optional[Union[ChatCompletionInputToolChoiceClass, "ChatCompletionInputToolChoiceEnum"]] = None,
+        tool_prompt: Optional[str] = None,
+        tools: Optional[List[ChatCompletionInputTool]] = None,
+        top_logprobs: Optional[int] = None,
+        top_p: Optional[float] = None,
+        extra_body: Optional[Dict] = None,
+    ) -> Iterable[ChatCompletionStreamOutput]: ...
+
+    @overload
+    def chat_completion(
+        self,
+        messages: List[Union[Dict, ChatCompletionInputMessage]],
+        *,
+        model: Optional[str] = None,
+        stream: bool = False,
+        frequency_penalty: Optional[float] = None,
+        logit_bias: Optional[List[float]] = None,
+        logprobs: Optional[bool] = None,
+        max_tokens: Optional[int] = None,
+        n: Optional[int] = None,
+        presence_penalty: Optional[float] = None,
+        response_format: Optional[ChatCompletionInputGrammarType] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stream_options: Optional[ChatCompletionInputStreamOptions] = None,
+        temperature: Optional[float] = None,
+        tool_choice: Optional[Union[ChatCompletionInputToolChoiceClass, "ChatCompletionInputToolChoiceEnum"]] = None,
+        tool_prompt: Optional[str] = None,
+        tools: Optional[List[ChatCompletionInputTool]] = None,
+        top_logprobs: Optional[int] = None,
+        top_p: Optional[float] = None,
+        extra_body: Optional[Dict] = None,
+    ) -> Union[ChatCompletionOutput, Iterable[ChatCompletionStreamOutput]]: ...
+
+    def chat_completion(
+        self,
+        messages: List[Union[Dict, ChatCompletionInputMessage]],
+        *,
+        model: Optional[str] = None,
+        stream: bool = False,
+        # Parameters from ChatCompletionInput (handled manually)
+        frequency_penalty: Optional[float] = None,
+        logit_bias: Optional[List[float]] = None,
+        logprobs: Optional[bool] = None,
+        max_tokens: Optional[int] = None,
+        n: Optional[int] = None,
+        presence_penalty: Optional[float] = None,
+        response_format: Optional[ChatCompletionInputGrammarType] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stream_options: Optional[ChatCompletionInputStreamOptions] = None,
+        temperature: Optional[float] = None,
+        tool_choice: Optional[Union[ChatCompletionInputToolChoiceClass, "ChatCompletionInputToolChoiceEnum"]] = None,
+        tool_prompt: Optional[str] = None,
+        tools: Optional[List[ChatCompletionInputTool]] = None,
+        top_logprobs: Optional[int] = None,
+        top_p: Optional[float] = None,
+        extra_body: Optional[Dict] = None,
+    ) -> Union[ChatCompletionOutput, Iterable[ChatCompletionStreamOutput]]:
+        """
+        A method for completing conversations using a specified language model.
+
+        > [!TIP]
+        > The `client.chat_completion` method is aliased as `client.chat.completions.create` for compatibility with OpenAI's client.
+        > Inputs and outputs are strictly the same and using either syntax will yield the same results.
+        > Check out the [Inference guide](https://huggingface.co/docs/huggingface_hub/guides/inference#openai-compatibility)
+        > for more details about OpenAI's compatibility.
+
+        > [!TIP]
+        > You can pass provider-specific parameters to the model by using the `extra_body` argument.
+
+        Args:
+            messages (List of [`ChatCompletionInputMessage`]):
+                Conversation history consisting of roles and content pairs.
+            model (`str`, *optional*):
+                The model to use for chat-completion. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended model for chat-based text-generation will be used.
+                See https://huggingface.co/tasks/text-generation for more details.
+                If `model` is a model ID, it is passed to the server as the `model` parameter. If you want to define a
+                custom URL while setting `model` in the request payload, you must set `base_url` when initializing [`InferenceClient`].
+            frequency_penalty (`float`, *optional*):
+                Penalizes new tokens based on their existing frequency
+                in the text so far. Range: [-2.0, 2.0]. Defaults to 0.0.
+            logit_bias (`List[float]`, *optional*):
+                Adjusts the likelihood of specific tokens appearing in the generated output.
+            logprobs (`bool`, *optional*):
+                Whether to return log probabilities of the output tokens or not. If true, returns the log
+                probabilities of each output token returned in the content of message.
+            max_tokens (`int`, *optional*):
+                Maximum number of tokens allowed in the response. Defaults to 100.
+            n (`int`, *optional*):
+                The number of completions to generate for each prompt.
+            presence_penalty (`float`, *optional*):
+                Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the
+                text so far, increasing the model's likelihood to talk about new topics.
+            response_format ([`ChatCompletionInputGrammarType`], *optional*):
+                Grammar constraints. Can be either a JSONSchema or a regex.
+            seed (Optional[`int`], *optional*):
+                Seed for reproducible control flow. Defaults to None.
+            stop (`List[str]`, *optional*):
+                Up to four strings which trigger the end of the response.
+                Defaults to None.
+            stream (`bool`, *optional*):
+                Enable realtime streaming of responses. Defaults to False.
+            stream_options ([`ChatCompletionInputStreamOptions`], *optional*):
+                Options for streaming completions.
+            temperature (`float`, *optional*):
+                Controls randomness of the generations. Lower values ensure
+                less random completions. Range: [0, 2]. Defaults to 1.0.
+            top_logprobs (`int`, *optional*):
+                An integer between 0 and 5 specifying the number of most likely tokens to return at each token
+                position, each with an associated log probability. logprobs must be set to true if this parameter is
+                used.
+            top_p (`float`, *optional*):
+                Fraction of the most likely next words to sample from.
+                Must be between 0 and 1. Defaults to 1.0.
+            tool_choice ([`ChatCompletionInputToolChoiceClass`] or [`ChatCompletionInputToolChoiceEnum`], *optional*):
+                The tool to use for the completion. Defaults to "auto".
+            tool_prompt (`str`, *optional*):
+                A prompt to be appended before the tools.
+            tools (List of [`ChatCompletionInputTool`], *optional*):
+                A list of tools the model may call. Currently, only functions are supported as a tool. Use this to
+                provide a list of functions the model may generate JSON inputs for.
+            extra_body (`Dict`, *optional*):
+                Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
+                for supported parameters.
+        Returns:
+            [`ChatCompletionOutput`] or Iterable of [`ChatCompletionStreamOutput`]:
+            Generated text returned from the server:
+            - if `stream=False`, the generated text is returned as a [`ChatCompletionOutput`] (default).
+            - if `stream=True`, the generated text is returned token by token as a sequence of [`ChatCompletionStreamOutput`].
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> messages = [{"role": "user", "content": "What is the capital of France?"}]
+        >>> client = InferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
+        >>> client.chat_completion(messages, max_tokens=100)
+        ChatCompletionOutput(
+            choices=[
+                ChatCompletionOutputComplete(
+                    finish_reason='eos_token',
+                    index=0,
+                    message=ChatCompletionOutputMessage(
+                        role='assistant',
+                        content='The capital of France is Paris.',
+                        name=None,
+                        tool_calls=None
+                    ),
+                    logprobs=None
+                )
+            ],
+            created=1719907176,
+            id='',
+            model='meta-llama/Meta-Llama-3-8B-Instruct',
+            object='text_completion',
+            system_fingerprint='2.0.4-sha-f426a33',
+            usage=ChatCompletionOutputUsage(
+                completion_tokens=8,
+                prompt_tokens=17,
+                total_tokens=25
+            )
+        )
+        ```
+
+        Example using streaming:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> messages = [{"role": "user", "content": "What is the capital of France?"}]
+        >>> client = InferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
+        >>> for token in client.chat_completion(messages, max_tokens=10, stream=True):
+        ...     print(token)
+        ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content='The', role='assistant'), index=0, finish_reason=None)], created=1710498504)
+        ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' capital', role='assistant'), index=0, finish_reason=None)], created=1710498504)
+        (...)
+        ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' may', role='assistant'), index=0, finish_reason=None)], created=1710498504)
+        ```
+
+        Example using OpenAI's syntax:
+        ```py
+        # instead of `from openai import OpenAI`
+        from huggingface_hub import InferenceClient
+
+        # instead of `client = OpenAI(...)`
+        client = InferenceClient(
+            base_url=...,
+            api_key=...,
+        )
+
+        output = client.chat.completions.create(
+            model="meta-llama/Meta-Llama-3-8B-Instruct",
+            messages=[
+                {"role": "system", "content": "You are a helpful assistant."},
+                {"role": "user", "content": "Count to 10"},
+            ],
+            stream=True,
+            max_tokens=1024,
+        )
+
+        for chunk in output:
+            print(chunk.choices[0].delta.content)
+        ```
+
+        Example using a third-party provider directly with extra (provider-specific) parameters. Usage will be billed on your Together AI account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="together",  # Use Together AI provider
+        ...     api_key="<together_api_key>",  # Pass your Together API key directly
+        ... )
+        >>> client.chat_completion(
+        ...     model="meta-llama/Meta-Llama-3-8B-Instruct",
+        ...     messages=[{"role": "user", "content": "What is the capital of France?"}],
+        ...     extra_body={"safety_model": "Meta-Llama/Llama-Guard-7b"},
+        ... )
+        ```
+
+        Example using a third-party provider through Hugging Face Routing. Usage will be billed on your Hugging Face account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="sambanova",  # Use Sambanova provider
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> client.chat_completion(
+        ...     model="meta-llama/Meta-Llama-3-8B-Instruct",
+        ...     messages=[{"role": "user", "content": "What is the capital of France?"}],
+        ... )
+        ```
+
+        Example using Image + Text as input:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+
+        # provide a remote URL
+        >>> image_url ="https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
+        # or a base64-encoded image
+        >>> image_path = "/path/to/image.jpeg"
+        >>> with open(image_path, "rb") as f:
+        ...     base64_image = base64.b64encode(f.read()).decode("utf-8")
+        >>> image_url = f"data:image/jpeg;base64,{base64_image}"
+
+        >>> client = InferenceClient("meta-llama/Llama-3.2-11B-Vision-Instruct")
+        >>> output = client.chat.completions.create(
+        ...     messages=[
+        ...         {
+        ...             "role": "user",
+        ...             "content": [
+        ...                 {
+        ...                     "type": "image_url",
+        ...                     "image_url": {"url": image_url},
+        ...                 },
+        ...                 {
+        ...                     "type": "text",
+        ...                     "text": "Describe this image in one sentence.",
+        ...                 },
+        ...             ],
+        ...         },
+        ...     ],
+        ... )
+        >>> output
+        The image depicts the iconic Statue of Liberty situated in New York Harbor, New York, on a clear day.
+        ```
+
+        Example using tools:
+        ```py
+        >>> client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
+        >>> messages = [
+        ...     {
+        ...         "role": "system",
+        ...         "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.",
+        ...     },
+        ...     {
+        ...         "role": "user",
+        ...         "content": "What's the weather like the next 3 days in San Francisco, CA?",
+        ...     },
+        ... ]
+        >>> tools = [
+        ...     {
+        ...         "type": "function",
+        ...         "function": {
+        ...             "name": "get_current_weather",
+        ...             "description": "Get the current weather",
+        ...             "parameters": {
+        ...                 "type": "object",
+        ...                 "properties": {
+        ...                     "location": {
+        ...                         "type": "string",
+        ...                         "description": "The city and state, e.g. San Francisco, CA",
+        ...                     },
+        ...                     "format": {
+        ...                         "type": "string",
+        ...                         "enum": ["celsius", "fahrenheit"],
+        ...                         "description": "The temperature unit to use. Infer this from the users location.",
+        ...                     },
+        ...                 },
+        ...                 "required": ["location", "format"],
+        ...             },
+        ...         },
+        ...     },
+        ...     {
+        ...         "type": "function",
+        ...         "function": {
+        ...             "name": "get_n_day_weather_forecast",
+        ...             "description": "Get an N-day weather forecast",
+        ...             "parameters": {
+        ...                 "type": "object",
+        ...                 "properties": {
+        ...                     "location": {
+        ...                         "type": "string",
+        ...                         "description": "The city and state, e.g. San Francisco, CA",
+        ...                     },
+        ...                     "format": {
+        ...                         "type": "string",
+        ...                         "enum": ["celsius", "fahrenheit"],
+        ...                         "description": "The temperature unit to use. Infer this from the users location.",
+        ...                     },
+        ...                     "num_days": {
+        ...                         "type": "integer",
+        ...                         "description": "The number of days to forecast",
+        ...                     },
+        ...                 },
+        ...                 "required": ["location", "format", "num_days"],
+        ...             },
+        ...         },
+        ...     },
+        ... ]
+
+        >>> response = client.chat_completion(
+        ...     model="meta-llama/Meta-Llama-3-70B-Instruct",
+        ...     messages=messages,
+        ...     tools=tools,
+        ...     tool_choice="auto",
+        ...     max_tokens=500,
+        ... )
+        >>> response.choices[0].message.tool_calls[0].function
+        ChatCompletionOutputFunctionDefinition(
+            arguments={
+                'location': 'San Francisco, CA',
+                'format': 'fahrenheit',
+                'num_days': 3
+            },
+            name='get_n_day_weather_forecast',
+            description=None
+        )
+        ```
+
+        Example using response_format:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
+        >>> messages = [
+        ...     {
+        ...         "role": "user",
+        ...         "content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
+        ...     },
+        ... ]
+        >>> response_format = {
+        ...     "type": "json",
+        ...     "value": {
+        ...         "properties": {
+        ...             "location": {"type": "string"},
+        ...             "activity": {"type": "string"},
+        ...             "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
+        ...             "animals": {"type": "array", "items": {"type": "string"}},
+        ...         },
+        ...         "required": ["location", "activity", "animals_seen", "animals"],
+        ...     },
+        ... }
+        >>> response = client.chat_completion(
+        ...     messages=messages,
+        ...     response_format=response_format,
+        ...     max_tokens=500,
+        ... )
+        >>> response.choices[0].message.content
+        '{\n\n"activity": "bike ride",\n"animals": ["puppy", "cat", "raccoon"],\n"animals_seen": 3,\n"location": "park"}'
+        ```
+        """
+        # Since `chat_completion(..., model=xxx)` is also a payload parameter for the server, we need to handle 'model' differently.
+        # `self.model` takes precedence over 'model' argument for building URL.
+        # `model` takes precedence for payload value.
+        model_id_or_url = self.model or model
+        payload_model = model or self.model
+
+        # Get the provider helper
+        provider_helper = get_provider_helper(
+            self.provider,
+            task="conversational",
+            model=model_id_or_url
+            if model_id_or_url is not None and model_id_or_url.startswith(("http://", "https://"))
+            else payload_model,
+        )
+
+        # Prepare the payload
+        parameters = {
+            "model": payload_model,
+            "frequency_penalty": frequency_penalty,
+            "logit_bias": logit_bias,
+            "logprobs": logprobs,
+            "max_tokens": max_tokens,
+            "n": n,
+            "presence_penalty": presence_penalty,
+            "response_format": response_format,
+            "seed": seed,
+            "stop": stop,
+            "temperature": temperature,
+            "tool_choice": tool_choice,
+            "tool_prompt": tool_prompt,
+            "tools": tools,
+            "top_logprobs": top_logprobs,
+            "top_p": top_p,
+            "stream": stream,
+            "stream_options": stream_options,
+            **(extra_body or {}),
+        }
+        request_parameters = provider_helper.prepare_request(
+            inputs=messages,
+            parameters=parameters,
+            headers=self.headers,
+            model=model_id_or_url,
+            api_key=self.token,
+        )
+        data = self._inner_post(request_parameters, stream=stream)
+
+        if stream:
+            return _stream_chat_completion_response(data)  # type: ignore[arg-type]
+
+        return ChatCompletionOutput.parse_obj_as_instance(data)  # type: ignore[arg-type]
+
+    def document_question_answering(
+        self,
+        image: ContentT,
+        question: str,
+        *,
+        model: Optional[str] = None,
+        doc_stride: Optional[int] = None,
+        handle_impossible_answer: Optional[bool] = None,
+        lang: Optional[str] = None,
+        max_answer_len: Optional[int] = None,
+        max_question_len: Optional[int] = None,
+        max_seq_len: Optional[int] = None,
+        top_k: Optional[int] = None,
+        word_boxes: Optional[List[Union[List[float], str]]] = None,
+    ) -> List[DocumentQuestionAnsweringOutputElement]:
+        """
+        Answer questions on document images.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO]`):
+                The input image for the context. It can be raw bytes, an image file, or a URL to an online image.
+            question (`str`):
+                Question to be answered.
+            model (`str`, *optional*):
+                The model to use for the document question answering task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended document question answering model will be used.
+                Defaults to None.
+            doc_stride (`int`, *optional*):
+                If the words in the document are too long to fit with the question for the model, it will be split in
+                several chunks with some overlap. This argument controls the size of that overlap.
+            handle_impossible_answer (`bool`, *optional*):
+                Whether to accept impossible as an answer
+            lang (`str`, *optional*):
+                Language to use while running OCR. Defaults to english.
+            max_answer_len (`int`, *optional*):
+                The maximum length of predicted answers (e.g., only answers with a shorter length are considered).
+            max_question_len (`int`, *optional*):
+                The maximum length of the question after tokenization. It will be truncated if needed.
+            max_seq_len (`int`, *optional*):
+                The maximum length of the total sentence (context + question) in tokens of each chunk passed to the
+                model. The context will be split in several chunks (using doc_stride as overlap) if needed.
+            top_k (`int`, *optional*):
+                The number of answers to return (will be chosen by order of likelihood). Can return less than top_k
+                answers if there are not enough options available within the context.
+            word_boxes (`List[Union[List[float], str`, *optional*):
+                A list of words and bounding boxes (normalized 0->1000). If provided, the inference will skip the OCR
+                step and use the provided bounding boxes instead.
+        Returns:
+            `List[DocumentQuestionAnsweringOutputElement]`: a list of [`DocumentQuestionAnsweringOutputElement`] items containing the predicted label, associated probability, word ids, and page number.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.document_question_answering(image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png", question="What is the invoice number?")
+        [DocumentQuestionAnsweringOutputElement(answer='us-001', end=16, score=0.9999666213989258, start=16)]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="document-question-answering", model=model_id)
+        inputs: Dict[str, Any] = {"question": question, "image": _b64_encode(image)}
+        request_parameters = provider_helper.prepare_request(
+            inputs=inputs,
+            parameters={
+                "doc_stride": doc_stride,
+                "handle_impossible_answer": handle_impossible_answer,
+                "lang": lang,
+                "max_answer_len": max_answer_len,
+                "max_question_len": max_question_len,
+                "max_seq_len": max_seq_len,
+                "top_k": top_k,
+                "word_boxes": word_boxes,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return DocumentQuestionAnsweringOutputElement.parse_obj_as_list(response)
+
+    def feature_extraction(
+        self,
+        text: str,
+        *,
+        normalize: Optional[bool] = None,
+        prompt_name: Optional[str] = None,
+        truncate: Optional[bool] = None,
+        truncation_direction: Optional[Literal["Left", "Right"]] = None,
+        model: Optional[str] = None,
+    ) -> "np.ndarray":
+        """
+        Generate embeddings for a given text.
+
+        Args:
+            text (`str`):
+                The text to embed.
+            model (`str`, *optional*):
+                The model to use for the feature extraction task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended feature extraction model will be used.
+                Defaults to None.
+            normalize (`bool`, *optional*):
+                Whether to normalize the embeddings or not.
+                Only available on server powered by Text-Embedding-Inference.
+            prompt_name (`str`, *optional*):
+                The name of the prompt that should be used by for encoding. If not set, no prompt will be applied.
+                Must be a key in the `Sentence Transformers` configuration `prompts` dictionary.
+                For example if ``prompt_name`` is "query" and the ``prompts`` is {"query": "query: ",...},
+                then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?"
+                because the prompt text will be prepended before any text to encode.
+            truncate (`bool`, *optional*):
+                Whether to truncate the embeddings or not.
+                Only available on server powered by Text-Embedding-Inference.
+            truncation_direction (`Literal["Left", "Right"]`, *optional*):
+                Which side of the input should be truncated when `truncate=True` is passed.
+
+        Returns:
+            `np.ndarray`: The embedding representing the input text as a float32 numpy array.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.feature_extraction("Hi, who are you?")
+        array([[ 2.424802  ,  2.93384   ,  1.1750331 , ...,  1.240499, -0.13776633, -0.7889173 ],
+        [-0.42943227, -0.6364878 , -1.693462  , ...,  0.41978157, -2.4336355 ,  0.6162071 ],
+        ...,
+        [ 0.28552425, -0.928395  , -1.2077185 , ...,  0.76810825, -2.1069427 ,  0.6236161 ]], dtype=float32)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="feature-extraction", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "normalize": normalize,
+                "prompt_name": prompt_name,
+                "truncate": truncate,
+                "truncation_direction": truncation_direction,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        np = _import_numpy()
+        return np.array(provider_helper.get_response(response), dtype="float32")
+
+    def fill_mask(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        targets: Optional[List[str]] = None,
+        top_k: Optional[int] = None,
+    ) -> List[FillMaskOutputElement]:
+        """
+        Fill in a hole with a missing word (token to be precise).
+
+        Args:
+            text (`str`):
+                a string to be filled from, must contain the [MASK] token (check model card for exact name of the mask).
+            model (`str`, *optional*):
+                The model to use for the fill mask task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended fill mask model will be used.
+            targets (`List[str`, *optional*):
+                When passed, the model will limit the scores to the passed targets instead of looking up in the whole
+                vocabulary. If the provided targets are not in the model vocab, they will be tokenized and the first
+                resulting token will be used (with a warning, and that might be slower).
+            top_k (`int`, *optional*):
+                When passed, overrides the number of predictions to return.
+        Returns:
+            `List[FillMaskOutputElement]`: a list of [`FillMaskOutputElement`] items containing the predicted label, associated
+            probability, token reference, and completed text.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.fill_mask("The goal of life is <mask>.")
+        [
+            FillMaskOutputElement(score=0.06897063553333282, token=11098, token_str=' happiness', sequence='The goal of life is happiness.'),
+            FillMaskOutputElement(score=0.06554922461509705, token=45075, token_str=' immortality', sequence='The goal of life is immortality.')
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="fill-mask", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={"targets": targets, "top_k": top_k},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return FillMaskOutputElement.parse_obj_as_list(response)
+
+    def image_classification(
+        self,
+        image: ContentT,
+        *,
+        model: Optional[str] = None,
+        function_to_apply: Optional["ImageClassificationOutputTransform"] = None,
+        top_k: Optional[int] = None,
+    ) -> List[ImageClassificationOutputElement]:
+        """
+        Perform image classification on the given image using the specified model.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The image to classify. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            model (`str`, *optional*):
+                The model to use for image classification. Can be a model ID hosted on the Hugging Face Hub or a URL to a
+                deployed Inference Endpoint. If not provided, the default recommended model for image classification will be used.
+            function_to_apply (`"ImageClassificationOutputTransform"`, *optional*):
+                The function to apply to the model outputs in order to retrieve the scores.
+            top_k (`int`, *optional*):
+                When specified, limits the output to the top K most probable classes.
+        Returns:
+            `List[ImageClassificationOutputElement]`: a list of [`ImageClassificationOutputElement`] items containing the predicted label and associated probability.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
+        [ImageClassificationOutputElement(label='Blenheim spaniel', score=0.9779096841812134), ...]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="image-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={"function_to_apply": function_to_apply, "top_k": top_k},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return ImageClassificationOutputElement.parse_obj_as_list(response)
+
+    def image_segmentation(
+        self,
+        image: ContentT,
+        *,
+        model: Optional[str] = None,
+        mask_threshold: Optional[float] = None,
+        overlap_mask_area_threshold: Optional[float] = None,
+        subtask: Optional["ImageSegmentationSubtask"] = None,
+        threshold: Optional[float] = None,
+    ) -> List[ImageSegmentationOutputElement]:
+        """
+        Perform image segmentation on the given image using the specified model.
+
+        > [!WARNING]
+        > You must have `PIL` installed if you want to work with images (`pip install Pillow`).
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The image to segment. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            model (`str`, *optional*):
+                The model to use for image segmentation. Can be a model ID hosted on the Hugging Face Hub or a URL to a
+                deployed Inference Endpoint. If not provided, the default recommended model for image segmentation will be used.
+            mask_threshold (`float`, *optional*):
+                Threshold to use when turning the predicted masks into binary values.
+            overlap_mask_area_threshold (`float`, *optional*):
+                Mask overlap threshold to eliminate small, disconnected segments.
+            subtask (`"ImageSegmentationSubtask"`, *optional*):
+                Segmentation task to be performed, depending on model capabilities.
+            threshold (`float`, *optional*):
+                Probability threshold to filter out predicted masks.
+        Returns:
+            `List[ImageSegmentationOutputElement]`: A list of [`ImageSegmentationOutputElement`] items containing the segmented masks and associated attributes.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.image_segmentation("cat.jpg")
+        [ImageSegmentationOutputElement(score=0.989008, label='LABEL_184', mask=<PIL.PngImagePlugin.PngImageFile image mode=L size=400x300 at 0x7FDD2B129CC0>), ...]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="image-segmentation", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={
+                "mask_threshold": mask_threshold,
+                "overlap_mask_area_threshold": overlap_mask_area_threshold,
+                "subtask": subtask,
+                "threshold": threshold,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        output = ImageSegmentationOutputElement.parse_obj_as_list(response)
+        for item in output:
+            item.mask = _b64_to_image(item.mask)  # type: ignore [assignment]
+        return output
+
+    def image_to_image(
+        self,
+        image: ContentT,
+        prompt: Optional[str] = None,
+        *,
+        negative_prompt: Optional[str] = None,
+        num_inference_steps: Optional[int] = None,
+        guidance_scale: Optional[float] = None,
+        model: Optional[str] = None,
+        target_size: Optional[ImageToImageTargetSize] = None,
+        **kwargs,
+    ) -> "Image":
+        """
+        Perform image-to-image translation using a specified model.
+
+        > [!WARNING]
+        > You must have `PIL` installed if you want to work with images (`pip install Pillow`).
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The input image for translation. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            prompt (`str`, *optional*):
+                The text prompt to guide the image generation.
+            negative_prompt (`str`, *optional*):
+                One prompt to guide what NOT to include in image generation.
+            num_inference_steps (`int`, *optional*):
+                For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher
+                quality image at the expense of slower inference.
+            guidance_scale (`float`, *optional*):
+                For diffusion models. A higher guidance scale value encourages the model to generate images closely
+                linked to the text prompt at the expense of lower image quality.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+            target_size (`ImageToImageTargetSize`, *optional*):
+                The size in pixels of the output image. This parameter is only supported by some providers and for
+                specific models. It will be ignored when unsupported.
+
+        Returns:
+            `Image`: The translated image.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> image = client.image_to_image("cat.jpg", prompt="turn the cat into a tiger")
+        >>> image.save("tiger.jpg")
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="image-to-image", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={
+                "prompt": prompt,
+                "negative_prompt": negative_prompt,
+                "target_size": target_size,
+                "num_inference_steps": num_inference_steps,
+                "guidance_scale": guidance_scale,
+                **kwargs,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        response = provider_helper.get_response(response, request_parameters)
+        return _bytes_to_image(response)
+
+    def image_to_video(
+        self,
+        image: ContentT,
+        *,
+        model: Optional[str] = None,
+        prompt: Optional[str] = None,
+        negative_prompt: Optional[str] = None,
+        num_frames: Optional[float] = None,
+        num_inference_steps: Optional[int] = None,
+        guidance_scale: Optional[float] = None,
+        seed: Optional[int] = None,
+        target_size: Optional[ImageToVideoTargetSize] = None,
+        **kwargs,
+    ) -> bytes:
+        """
+        Generate a video from an input image.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The input image to generate a video from. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+            prompt (`str`, *optional*):
+                The text prompt to guide the video generation.
+            negative_prompt (`str`, *optional*):
+                One prompt to guide what NOT to include in video generation.
+            num_frames (`float`, *optional*):
+                The num_frames parameter determines how many video frames are generated.
+            num_inference_steps (`int`, *optional*):
+                For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher
+                quality image at the expense of slower inference.
+            guidance_scale (`float`, *optional*):
+                For diffusion models. A higher guidance scale value encourages the model to generate videos closely
+                linked to the text prompt at the expense of lower image quality.
+            seed (`int`, *optional*):
+                The seed to use for the video generation.
+            target_size (`ImageToVideoTargetSize`, *optional*):
+                The size in pixel of the output video frames.
+            num_inference_steps (`int`, *optional*):
+                The number of denoising steps. More denoising steps usually lead to a higher quality video at the
+                expense of slower inference.
+            seed (`int`, *optional*):
+                Seed for the random number generator.
+
+        Returns:
+            `bytes`: The generated video.
+
+        Examples:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> video = client.image_to_video("cat.jpg", model="Wan-AI/Wan2.2-I2V-A14B", prompt="turn the cat into a tiger")
+        >>> with open("tiger.mp4", "wb") as f:
+        ...     f.write(video)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="image-to-video", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={
+                "prompt": prompt,
+                "negative_prompt": negative_prompt,
+                "num_frames": num_frames,
+                "num_inference_steps": num_inference_steps,
+                "guidance_scale": guidance_scale,
+                "seed": seed,
+                "target_size": target_size,
+                **kwargs,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        response = provider_helper.get_response(response, request_parameters)
+        return response
+
+    def image_to_text(self, image: ContentT, *, model: Optional[str] = None) -> ImageToTextOutput:
+        """
+        Takes an input image and return text.
+
+        Models can have very different outputs depending on your use case (image captioning, optical character recognition
+        (OCR), Pix2Struct, etc). Please have a look to the model card to learn more about a model's specificities.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The input image to caption. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+
+        Returns:
+            [`ImageToTextOutput`]: The generated text.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.image_to_text("cat.jpg")
+        'a cat standing in a grassy field '
+        >>> client.image_to_text("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
+        'a dog laying on the grass next to a flower pot '
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="image-to-text", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        output_list: List[ImageToTextOutput] = ImageToTextOutput.parse_obj_as_list(response)
+        return output_list[0]
+
+    def object_detection(
+        self, image: ContentT, *, model: Optional[str] = None, threshold: Optional[float] = None
+    ) -> List[ObjectDetectionOutputElement]:
+        """
+        Perform object detection on the given image using the specified model.
+
+        > [!WARNING]
+        > You must have `PIL` installed if you want to work with images (`pip install Pillow`).
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The image to detect objects on. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            model (`str`, *optional*):
+                The model to use for object detection. Can be a model ID hosted on the Hugging Face Hub or a URL to a
+                deployed Inference Endpoint. If not provided, the default recommended model for object detection (DETR) will be used.
+            threshold (`float`, *optional*):
+                The probability necessary to make a prediction.
+        Returns:
+            `List[ObjectDetectionOutputElement]`: A list of [`ObjectDetectionOutputElement`] items containing the bounding boxes and associated attributes.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+            `ValueError`:
+                If the request output is not a List.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.object_detection("people.jpg")
+        [ObjectDetectionOutputElement(score=0.9486683011054993, label='person', box=ObjectDetectionBoundingBox(xmin=59, ymin=39, xmax=420, ymax=510)), ...]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="object-detection", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={"threshold": threshold},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return ObjectDetectionOutputElement.parse_obj_as_list(response)
+
+    def question_answering(
+        self,
+        question: str,
+        context: str,
+        *,
+        model: Optional[str] = None,
+        align_to_words: Optional[bool] = None,
+        doc_stride: Optional[int] = None,
+        handle_impossible_answer: Optional[bool] = None,
+        max_answer_len: Optional[int] = None,
+        max_question_len: Optional[int] = None,
+        max_seq_len: Optional[int] = None,
+        top_k: Optional[int] = None,
+    ) -> Union[QuestionAnsweringOutputElement, List[QuestionAnsweringOutputElement]]:
+        """
+        Retrieve the answer to a question from a given text.
+
+        Args:
+            question (`str`):
+                Question to be answered.
+            context (`str`):
+                The context of the question.
+            model (`str`):
+                The model to use for the question answering task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint.
+            align_to_words (`bool`, *optional*):
+                Attempts to align the answer to real words. Improves quality on space separated languages. Might hurt
+                on non-space-separated languages (like Japanese or Chinese)
+            doc_stride (`int`, *optional*):
+                If the context is too long to fit with the question for the model, it will be split in several chunks
+                with some overlap. This argument controls the size of that overlap.
+            handle_impossible_answer (`bool`, *optional*):
+                Whether to accept impossible as an answer.
+            max_answer_len (`int`, *optional*):
+                The maximum length of predicted answers (e.g., only answers with a shorter length are considered).
+            max_question_len (`int`, *optional*):
+                The maximum length of the question after tokenization. It will be truncated if needed.
+            max_seq_len (`int`, *optional*):
+                The maximum length of the total sentence (context + question) in tokens of each chunk passed to the
+                model. The context will be split in several chunks (using docStride as overlap) if needed.
+            top_k (`int`, *optional*):
+                The number of answers to return (will be chosen by order of likelihood). Note that we return less than
+                topk answers if there are not enough options available within the context.
+
+        Returns:
+            Union[`QuestionAnsweringOutputElement`, List[`QuestionAnsweringOutputElement`]]:
+                When top_k is 1 or not provided, it returns a single `QuestionAnsweringOutputElement`.
+                When top_k is greater than 1, it returns a list of `QuestionAnsweringOutputElement`.
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.question_answering(question="What's my name?", context="My name is Clara and I live in Berkeley.")
+        QuestionAnsweringOutputElement(answer='Clara', end=16, score=0.9326565265655518, start=11)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="question-answering", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs={"question": question, "context": context},
+            parameters={
+                "align_to_words": align_to_words,
+                "doc_stride": doc_stride,
+                "handle_impossible_answer": handle_impossible_answer,
+                "max_answer_len": max_answer_len,
+                "max_question_len": max_question_len,
+                "max_seq_len": max_seq_len,
+                "top_k": top_k,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        # Parse the response as a single `QuestionAnsweringOutputElement` when top_k is 1 or not provided, or a list of `QuestionAnsweringOutputElement` to ensure backward compatibility.
+        output = QuestionAnsweringOutputElement.parse_obj(response)
+        return output
+
+    def sentence_similarity(
+        self, sentence: str, other_sentences: List[str], *, model: Optional[str] = None
+    ) -> List[float]:
+        """
+        Compute the semantic similarity between a sentence and a list of other sentences by comparing their embeddings.
+
+        Args:
+            sentence (`str`):
+                The main sentence to compare to others.
+            other_sentences (`List[str]`):
+                The list of sentences to compare to.
+            model (`str`, *optional*):
+                The model to use for the sentence similarity task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended sentence similarity model will be used.
+                Defaults to None.
+
+        Returns:
+            `List[float]`: The similarity scores between the main sentence and the given comparison sentences.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.sentence_similarity(
+        ...     "Machine learning is so easy.",
+        ...     other_sentences=[
+        ...         "Deep learning is so straightforward.",
+        ...         "This is so difficult, like rocket science.",
+        ...         "I can't believe how much I struggled with this.",
+        ...     ],
+        ... )
+        [0.7785726189613342, 0.45876261591911316, 0.2906220555305481]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="sentence-similarity", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs={"source_sentence": sentence, "sentences": other_sentences},
+            parameters={},
+            extra_payload={},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return _bytes_to_list(response)
+
+    def summarization(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        clean_up_tokenization_spaces: Optional[bool] = None,
+        generate_parameters: Optional[Dict[str, Any]] = None,
+        truncation: Optional["SummarizationTruncationStrategy"] = None,
+    ) -> SummarizationOutput:
+        """
+        Generate a summary of a given text using a specified model.
+
+        Args:
+            text (`str`):
+                The input text to summarize.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended model for summarization will be used.
+            clean_up_tokenization_spaces (`bool`, *optional*):
+                Whether to clean up the potential extra spaces in the text output.
+            generate_parameters (`Dict[str, Any]`, *optional*):
+                Additional parametrization of the text generation algorithm.
+            truncation (`"SummarizationTruncationStrategy"`, *optional*):
+                The truncation strategy to use.
+        Returns:
+            [`SummarizationOutput`]: The generated summary text.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.summarization("The Eiffel tower...")
+        SummarizationOutput(generated_text="The Eiffel tower is one of the most famous landmarks in the world....")
+        ```
+        """
+        parameters = {
+            "clean_up_tokenization_spaces": clean_up_tokenization_spaces,
+            "generate_parameters": generate_parameters,
+            "truncation": truncation,
+        }
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="summarization", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters=parameters,
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return SummarizationOutput.parse_obj_as_list(response)[0]
+
+    def table_question_answering(
+        self,
+        table: Dict[str, Any],
+        query: str,
+        *,
+        model: Optional[str] = None,
+        padding: Optional["Padding"] = None,
+        sequential: Optional[bool] = None,
+        truncation: Optional[bool] = None,
+    ) -> TableQuestionAnsweringOutputElement:
+        """
+        Retrieve the answer to a question from information given in a table.
+
+        Args:
+            table (`str`):
+                A table of data represented as a dict of lists where entries are headers and the lists are all the
+                values, all lists must have the same size.
+            query (`str`):
+                The query in plain text that you want to ask the table.
+            model (`str`):
+                The model to use for the table-question-answering task. Can be a model ID hosted on the Hugging Face
+                Hub or a URL to a deployed Inference Endpoint.
+            padding (`"Padding"`, *optional*):
+                Activates and controls padding.
+            sequential (`bool`, *optional*):
+                Whether to do inference sequentially or as a batch. Batching is faster, but models like SQA require the
+                inference to be done sequentially to extract relations within sequences, given their conversational
+                nature.
+            truncation (`bool`, *optional*):
+                Activates and controls truncation.
+
+        Returns:
+            [`TableQuestionAnsweringOutputElement`]: a table question answering output containing the answer, coordinates, cells and the aggregator used.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> query = "How many stars does the transformers repository have?"
+        >>> table = {"Repository": ["Transformers", "Datasets", "Tokenizers"], "Stars": ["36542", "4512", "3934"]}
+        >>> client.table_question_answering(table, query, model="google/tapas-base-finetuned-wtq")
+        TableQuestionAnsweringOutputElement(answer='36542', coordinates=[[0, 1]], cells=['36542'], aggregator='AVERAGE')
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="table-question-answering", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs={"query": query, "table": table},
+            parameters={"model": model, "padding": padding, "sequential": sequential, "truncation": truncation},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return TableQuestionAnsweringOutputElement.parse_obj_as_instance(response)
+
+    def tabular_classification(self, table: Dict[str, Any], *, model: Optional[str] = None) -> List[str]:
+        """
+        Classifying a target category (a group) based on a set of attributes.
+
+        Args:
+            table (`Dict[str, Any]`):
+                Set of attributes to classify.
+            model (`str`, *optional*):
+                The model to use for the tabular classification task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended tabular classification model will be used.
+                Defaults to None.
+
+        Returns:
+            `List`: a list of labels, one per row in the initial table.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> table = {
+        ...     "fixed_acidity": ["7.4", "7.8", "10.3"],
+        ...     "volatile_acidity": ["0.7", "0.88", "0.32"],
+        ...     "citric_acid": ["0", "0", "0.45"],
+        ...     "residual_sugar": ["1.9", "2.6", "6.4"],
+        ...     "chlorides": ["0.076", "0.098", "0.073"],
+        ...     "free_sulfur_dioxide": ["11", "25", "5"],
+        ...     "total_sulfur_dioxide": ["34", "67", "13"],
+        ...     "density": ["0.9978", "0.9968", "0.9976"],
+        ...     "pH": ["3.51", "3.2", "3.23"],
+        ...     "sulphates": ["0.56", "0.68", "0.82"],
+        ...     "alcohol": ["9.4", "9.8", "12.6"],
+        ... }
+        >>> client.tabular_classification(table=table, model="julien-c/wine-quality")
+        ["5", "5", "5"]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="tabular-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=None,
+            extra_payload={"table": table},
+            parameters={},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return _bytes_to_list(response)
+
+    def tabular_regression(self, table: Dict[str, Any], *, model: Optional[str] = None) -> List[float]:
+        """
+        Predicting a numerical target value given a set of attributes/features in a table.
+
+        Args:
+            table (`Dict[str, Any]`):
+                Set of attributes stored in a table. The attributes used to predict the target can be both numerical and categorical.
+            model (`str`, *optional*):
+                The model to use for the tabular regression task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended tabular regression model will be used.
+                Defaults to None.
+
+        Returns:
+            `List`: a list of predicted numerical target values.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> table = {
+        ...     "Height": ["11.52", "12.48", "12.3778"],
+        ...     "Length1": ["23.2", "24", "23.9"],
+        ...     "Length2": ["25.4", "26.3", "26.5"],
+        ...     "Length3": ["30", "31.2", "31.1"],
+        ...     "Species": ["Bream", "Bream", "Bream"],
+        ...     "Width": ["4.02", "4.3056", "4.6961"],
+        ... }
+        >>> client.tabular_regression(table, model="scikit-learn/Fish-Weight")
+        [110, 120, 130]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="tabular-regression", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=None,
+            parameters={},
+            extra_payload={"table": table},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return _bytes_to_list(response)
+
+    def text_classification(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        top_k: Optional[int] = None,
+        function_to_apply: Optional["TextClassificationOutputTransform"] = None,
+    ) -> List[TextClassificationOutputElement]:
+        """
+        Perform text classification (e.g. sentiment-analysis) on the given text.
+
+        Args:
+            text (`str`):
+                A string to be classified.
+            model (`str`, *optional*):
+                The model to use for the text classification task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended text classification model will be used.
+                Defaults to None.
+            top_k (`int`, *optional*):
+                When specified, limits the output to the top K most probable classes.
+            function_to_apply (`"TextClassificationOutputTransform"`, *optional*):
+                The function to apply to the model outputs in order to retrieve the scores.
+
+        Returns:
+            `List[TextClassificationOutputElement]`: a list of [`TextClassificationOutputElement`] items containing the predicted label and associated probability.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.text_classification("I like you")
+        [
+            TextClassificationOutputElement(label='POSITIVE', score=0.9998695850372314),
+            TextClassificationOutputElement(label='NEGATIVE', score=0.0001304351753788069),
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="text-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "function_to_apply": function_to_apply,
+                "top_k": top_k,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return TextClassificationOutputElement.parse_obj_as_list(response)[0]  # type: ignore [return-value]
+
+    @overload
+    def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Literal[True],
+        stream: Literal[True],
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> Iterable[TextGenerationStreamOutput]: ...
+
+    @overload
+    def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Literal[True],
+        stream: Optional[Literal[False]] = None,
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> TextGenerationOutput: ...
+
+    @overload
+    def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Optional[Literal[False]] = None,
+        stream: Literal[True],
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,  # Manual default value
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> Iterable[str]: ...
+
+    @overload
+    def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Optional[Literal[False]] = None,
+        stream: Optional[Literal[False]] = None,
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> str: ...
+
+    @overload
+    def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Optional[bool] = None,
+        stream: Optional[bool] = None,
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> Union[str, TextGenerationOutput, Iterable[str], Iterable[TextGenerationStreamOutput]]: ...
+
+    def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Optional[bool] = None,
+        stream: Optional[bool] = None,
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> Union[str, TextGenerationOutput, Iterable[str], Iterable[TextGenerationStreamOutput]]:
+        """
+        Given a prompt, generate the following text.
+
+        > [!TIP]
+        > If you want to generate a response from chat messages, you should use the [`InferenceClient.chat_completion`] method.
+        > It accepts a list of messages instead of a single text prompt and handles the chat templating for you.
+
+        Args:
+            prompt (`str`):
+                Input text.
+            details (`bool`, *optional*):
+                By default, text_generation returns a string. Pass `details=True` if you want a detailed output (tokens,
+                probabilities, seed, finish reason, etc.). Only available for models running on with the
+                `text-generation-inference` backend.
+            stream (`bool`, *optional*):
+                By default, text_generation returns the full generated text. Pass `stream=True` if you want a stream of
+                tokens to be returned. Only available for models running on with the `text-generation-inference`
+                backend.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+            adapter_id (`str`, *optional*):
+                Lora adapter id.
+            best_of (`int`, *optional*):
+                Generate best_of sequences and return the one if the highest token logprobs.
+            decoder_input_details (`bool`, *optional*):
+                Return the decoder input token logprobs and ids. You must set `details=True` as well for it to be taken
+                into account. Defaults to `False`.
+            do_sample (`bool`, *optional*):
+                Activate logits sampling
+            frequency_penalty (`float`, *optional*):
+                Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in
+                the text so far, decreasing the model's likelihood to repeat the same line verbatim.
+            grammar ([`TextGenerationInputGrammarType`], *optional*):
+                Grammar constraints. Can be either a JSONSchema or a regex.
+            max_new_tokens (`int`, *optional*):
+                Maximum number of generated tokens. Defaults to 100.
+            repetition_penalty (`float`, *optional*):
+                The parameter for repetition penalty. 1.0 means no penalty. See [this
+                paper](https://arxiv.org/pdf/1909.05858.pdf) for more details.
+            return_full_text (`bool`, *optional*):
+                Whether to prepend the prompt to the generated text
+            seed (`int`, *optional*):
+                Random sampling seed
+            stop (`List[str]`, *optional*):
+                Stop generating tokens if a member of `stop` is generated.
+            stop_sequences (`List[str]`, *optional*):
+                Deprecated argument. Use `stop` instead.
+            temperature (`float`, *optional*):
+                The value used to module the logits distribution.
+            top_n_tokens (`int`, *optional*):
+                Return information about the `top_n_tokens` most likely tokens at each generation step, instead of
+                just the sampled token.
+            top_k (`int`, *optional`):
+                The number of highest probability vocabulary tokens to keep for top-k-filtering.
+            top_p (`float`, *optional`):
+                If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
+                higher are kept for generation.
+            truncate (`int`, *optional`):
+                Truncate inputs tokens to the given size.
+            typical_p (`float`, *optional`):
+                Typical Decoding mass
+                See [Typical Decoding for Natural Language Generation](https://arxiv.org/abs/2202.00666) for more information
+            watermark (`bool`, *optional*):
+                Watermarking with [A Watermark for Large Language Models](https://arxiv.org/abs/2301.10226)
+
+        Returns:
+            `Union[str, TextGenerationOutput, Iterable[str], Iterable[TextGenerationStreamOutput]]`:
+            Generated text returned from the server:
+            - if `stream=False` and `details=False`, the generated text is returned as a `str` (default)
+            - if `stream=True` and `details=False`, the generated text is returned token by token as a `Iterable[str]`
+            - if `stream=False` and `details=True`, the generated text is returned with more details as a [`~huggingface_hub.TextGenerationOutput`]
+            - if `details=True` and `stream=True`, the generated text is returned token by token as a iterable of [`~huggingface_hub.TextGenerationStreamOutput`]
+
+        Raises:
+            `ValidationError`:
+                If input values are not valid. No HTTP call is made to the server.
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+
+        # Case 1: generate text
+        >>> client.text_generation("The huggingface_hub library is ", max_new_tokens=12)
+        '100% open source and built to be easy to use.'
+
+        # Case 2: iterate over the generated tokens. Useful for large generation.
+        >>> for token in client.text_generation("The huggingface_hub library is ", max_new_tokens=12, stream=True):
+        ...     print(token)
+        100
+        %
+        open
+        source
+        and
+        built
+        to
+        be
+        easy
+        to
+        use
+        .
+
+        # Case 3: get more details about the generation process.
+        >>> client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
+        TextGenerationOutput(
+            generated_text='100% open source and built to be easy to use.',
+            details=TextGenerationDetails(
+                finish_reason='length',
+                generated_tokens=12,
+                seed=None,
+                prefill=[
+                    TextGenerationPrefillOutputToken(id=487, text='The', logprob=None),
+                    TextGenerationPrefillOutputToken(id=53789, text=' hugging', logprob=-13.171875),
+                    (...)
+                    TextGenerationPrefillOutputToken(id=204, text=' ', logprob=-7.0390625)
+                ],
+                tokens=[
+                    TokenElement(id=1425, text='100', logprob=-1.0175781, special=False),
+                    TokenElement(id=16, text='%', logprob=-0.0463562, special=False),
+                    (...)
+                    TokenElement(id=25, text='.', logprob=-0.5703125, special=False)
+                ],
+                best_of_sequences=None
+            )
+        )
+
+        # Case 4: iterate over the generated tokens with more details.
+        # Last object is more complete, containing the full generated text and the finish reason.
+        >>> for details in client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True, stream=True):
+        ...     print(details)
+        ...
+        TextGenerationStreamOutput(token=TokenElement(id=1425, text='100', logprob=-1.0175781, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=16, text='%', logprob=-0.0463562, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=1314, text=' open', logprob=-1.3359375, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=3178, text=' source', logprob=-0.28100586, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=273, text=' and', logprob=-0.5961914, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=3426, text=' built', logprob=-1.9423828, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-1.4121094, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=314, text=' be', logprob=-1.5224609, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=1833, text=' easy', logprob=-2.1132812, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-0.08520508, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=745, text=' use', logprob=-0.39453125, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(
+            id=25,
+            text='.',
+            logprob=-0.5703125,
+            special=False),
+            generated_text='100% open source and built to be easy to use.',
+            details=TextGenerationStreamOutputStreamDetails(finish_reason='length', generated_tokens=12, seed=None)
+        )
+
+        # Case 5: generate constrained output using grammar
+        >>> response = client.text_generation(
+        ...     prompt="I saw a puppy a cat and a raccoon during my bike ride in the park",
+        ...     model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
+        ...     max_new_tokens=100,
+        ...     repetition_penalty=1.3,
+        ...     grammar={
+        ...         "type": "json",
+        ...         "value": {
+        ...             "properties": {
+        ...                 "location": {"type": "string"},
+        ...                 "activity": {"type": "string"},
+        ...                 "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
+        ...                 "animals": {"type": "array", "items": {"type": "string"}},
+        ...             },
+        ...             "required": ["location", "activity", "animals_seen", "animals"],
+        ...         },
+        ...     },
+        ... )
+        >>> json.loads(response)
+        {
+            "activity": "bike riding",
+            "animals": ["puppy", "cat", "raccoon"],
+            "animals_seen": 3,
+            "location": "park"
+        }
+        ```
+        """
+        if decoder_input_details and not details:
+            warnings.warn(
+                "`decoder_input_details=True` has been passed to the server but `details=False` is set meaning that"
+                " the output from the server will be truncated."
+            )
+            decoder_input_details = False
+
+        if stop_sequences is not None:
+            warnings.warn(
+                "`stop_sequences` is a deprecated argument for `text_generation` task"
+                " and will be removed in version '0.28.0'. Use `stop` instead.",
+                FutureWarning,
+            )
+        if stop is None:
+            stop = stop_sequences  # use deprecated arg if provided
+
+        # Build payload
+        parameters = {
+            "adapter_id": adapter_id,
+            "best_of": best_of,
+            "decoder_input_details": decoder_input_details,
+            "details": details,
+            "do_sample": do_sample,
+            "frequency_penalty": frequency_penalty,
+            "grammar": grammar,
+            "max_new_tokens": max_new_tokens,
+            "repetition_penalty": repetition_penalty,
+            "return_full_text": return_full_text,
+            "seed": seed,
+            "stop": stop,
+            "temperature": temperature,
+            "top_k": top_k,
+            "top_n_tokens": top_n_tokens,
+            "top_p": top_p,
+            "truncate": truncate,
+            "typical_p": typical_p,
+            "watermark": watermark,
+        }
+
+        # Remove some parameters if not a TGI server
+        unsupported_kwargs = _get_unsupported_text_generation_kwargs(model)
+        if len(unsupported_kwargs) > 0:
+            # The server does not support some parameters
+            # => means it is not a TGI server
+            # => remove unsupported parameters and warn the user
+
+            ignored_parameters = []
+            for key in unsupported_kwargs:
+                if parameters.get(key):
+                    ignored_parameters.append(key)
+                parameters.pop(key, None)
+            if len(ignored_parameters) > 0:
+                warnings.warn(
+                    "API endpoint/model for text-generation is not served via TGI. Ignoring following parameters:"
+                    f" {', '.join(ignored_parameters)}.",
+                    UserWarning,
+                )
+            if details:
+                warnings.warn(
+                    "API endpoint/model for text-generation is not served via TGI. Parameter `details=True` will"
+                    " be ignored meaning only the generated text will be returned.",
+                    UserWarning,
+                )
+                details = False
+            if stream:
+                raise ValueError(
+                    "API endpoint/model for text-generation is not served via TGI. Cannot return output as a stream."
+                    " Please pass `stream=False` as input."
+                )
+
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="text-generation", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=prompt,
+            parameters=parameters,
+            extra_payload={"stream": stream},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+
+        # Handle errors separately for more precise error messages
+        try:
+            bytes_output = self._inner_post(request_parameters, stream=stream or False)
+        except HTTPError as e:
+            match = MODEL_KWARGS_NOT_USED_REGEX.search(str(e))
+            if isinstance(e, BadRequestError) and match:
+                unused_params = [kwarg.strip("' ") for kwarg in match.group(1).split(",")]
+                _set_unsupported_text_generation_kwargs(model, unused_params)
+                return self.text_generation(  # type: ignore
+                    prompt=prompt,
+                    details=details,
+                    stream=stream,
+                    model=model_id,
+                    adapter_id=adapter_id,
+                    best_of=best_of,
+                    decoder_input_details=decoder_input_details,
+                    do_sample=do_sample,
+                    frequency_penalty=frequency_penalty,
+                    grammar=grammar,
+                    max_new_tokens=max_new_tokens,
+                    repetition_penalty=repetition_penalty,
+                    return_full_text=return_full_text,
+                    seed=seed,
+                    stop=stop,
+                    temperature=temperature,
+                    top_k=top_k,
+                    top_n_tokens=top_n_tokens,
+                    top_p=top_p,
+                    truncate=truncate,
+                    typical_p=typical_p,
+                    watermark=watermark,
+                )
+            raise_text_generation_error(e)
+
+        # Parse output
+        if stream:
+            return _stream_text_generation_response(bytes_output, details)  # type: ignore
+
+        data = _bytes_to_dict(bytes_output)  # type: ignore[arg-type]
+
+        # Data can be a single element (dict) or an iterable of dicts where we select the first element of.
+        if isinstance(data, list):
+            data = data[0]
+        response = provider_helper.get_response(data, request_parameters)
+        return TextGenerationOutput.parse_obj_as_instance(response) if details else response["generated_text"]
+
+    def text_to_image(
+        self,
+        prompt: str,
+        *,
+        negative_prompt: Optional[str] = None,
+        height: Optional[int] = None,
+        width: Optional[int] = None,
+        num_inference_steps: Optional[int] = None,
+        guidance_scale: Optional[float] = None,
+        model: Optional[str] = None,
+        scheduler: Optional[str] = None,
+        seed: Optional[int] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+    ) -> "Image":
+        """
+        Generate an image based on a given text using a specified model.
+
+        > [!WARNING]
+        > You must have `PIL` installed if you want to work with images (`pip install Pillow`).
+
+        > [!TIP]
+        > You can pass provider-specific parameters to the model by using the `extra_body` argument.
+
+        Args:
+            prompt (`str`):
+                The prompt to generate an image from.
+            negative_prompt (`str`, *optional*):
+                One prompt to guide what NOT to include in image generation.
+            height (`int`, *optional*):
+                The height in pixels of the output image
+            width (`int`, *optional*):
+                The width in pixels of the output image
+            num_inference_steps (`int`, *optional*):
+                The number of denoising steps. More denoising steps usually lead to a higher quality image at the
+                expense of slower inference.
+            guidance_scale (`float`, *optional*):
+                A higher guidance scale value encourages the model to generate images closely linked to the text
+                prompt, but values too high may cause saturation and other artifacts.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended text-to-image model will be used.
+                Defaults to None.
+            scheduler (`str`, *optional*):
+                Override the scheduler with a compatible one.
+            seed (`int`, *optional*):
+                Seed for the random number generator.
+            extra_body (`Dict[str, Any]`, *optional*):
+                Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
+                for supported parameters.
+
+        Returns:
+            `Image`: The generated image.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+
+        >>> image = client.text_to_image("An astronaut riding a horse on the moon.")
+        >>> image.save("astronaut.png")
+
+        >>> image = client.text_to_image(
+        ...     "An astronaut riding a horse on the moon.",
+        ...     negative_prompt="low resolution, blurry",
+        ...     model="stabilityai/stable-diffusion-2-1",
+        ... )
+        >>> image.save("better_astronaut.png")
+        ```
+        Example using a third-party provider directly. Usage will be billed on your fal.ai account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="fal-ai",  # Use fal.ai provider
+        ...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
+        ... )
+        >>> image = client.text_to_image(
+        ...     "A majestic lion in a fantasy forest",
+        ...     model="black-forest-labs/FLUX.1-schnell",
+        ... )
+        >>> image.save("lion.png")
+        ```
+
+        Example using a third-party provider through Hugging Face Routing. Usage will be billed on your Hugging Face account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",  # Use replicate provider
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> image = client.text_to_image(
+        ...     "An astronaut riding a horse on the moon.",
+        ...     model="black-forest-labs/FLUX.1-dev",
+        ... )
+        >>> image.save("astronaut.png")
+        ```
+
+        Example using Replicate provider with extra parameters
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",  # Use replicate provider
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> image = client.text_to_image(
+        ...     "An astronaut riding a horse on the moon.",
+        ...     model="black-forest-labs/FLUX.1-schnell",
+        ...     extra_body={"output_quality": 100},
+        ... )
+        >>> image.save("astronaut.png")
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="text-to-image", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=prompt,
+            parameters={
+                "negative_prompt": negative_prompt,
+                "height": height,
+                "width": width,
+                "num_inference_steps": num_inference_steps,
+                "guidance_scale": guidance_scale,
+                "scheduler": scheduler,
+                "seed": seed,
+                **(extra_body or {}),
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        response = provider_helper.get_response(response)
+        return _bytes_to_image(response)
+
+    def text_to_video(
+        self,
+        prompt: str,
+        *,
+        model: Optional[str] = None,
+        guidance_scale: Optional[float] = None,
+        negative_prompt: Optional[List[str]] = None,
+        num_frames: Optional[float] = None,
+        num_inference_steps: Optional[int] = None,
+        seed: Optional[int] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+    ) -> bytes:
+        """
+        Generate a video based on a given text.
+
+        > [!TIP]
+        > You can pass provider-specific parameters to the model by using the `extra_body` argument.
+
+        Args:
+            prompt (`str`):
+                The prompt to generate a video from.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended text-to-video model will be used.
+                Defaults to None.
+            guidance_scale (`float`, *optional*):
+                A higher guidance scale value encourages the model to generate videos closely linked to the text
+                prompt, but values too high may cause saturation and other artifacts.
+            negative_prompt (`List[str]`, *optional*):
+                One or several prompt to guide what NOT to include in video generation.
+            num_frames (`float`, *optional*):
+                The num_frames parameter determines how many video frames are generated.
+            num_inference_steps (`int`, *optional*):
+                The number of denoising steps. More denoising steps usually lead to a higher quality video at the
+                expense of slower inference.
+            seed (`int`, *optional*):
+                Seed for the random number generator.
+            extra_body (`Dict[str, Any]`, *optional*):
+                Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
+                for supported parameters.
+
+        Returns:
+            `bytes`: The generated video.
+
+        Example:
+
+        Example using a third-party provider directly. Usage will be billed on your fal.ai account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="fal-ai",  # Using fal.ai provider
+        ...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
+        ... )
+        >>> video = client.text_to_video(
+        ...     "A majestic lion running in a fantasy forest",
+        ...     model="tencent/HunyuanVideo",
+        ... )
+        >>> with open("lion.mp4", "wb") as file:
+        ...     file.write(video)
+        ```
+
+        Example using a third-party provider through Hugging Face Routing. Usage will be billed on your Hugging Face account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",  # Using replicate provider
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> video = client.text_to_video(
+        ...     "A cat running in a park",
+        ...     model="genmo/mochi-1-preview",
+        ... )
+        >>> with open("cat.mp4", "wb") as file:
+        ...     file.write(video)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="text-to-video", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=prompt,
+            parameters={
+                "guidance_scale": guidance_scale,
+                "negative_prompt": negative_prompt,
+                "num_frames": num_frames,
+                "num_inference_steps": num_inference_steps,
+                "seed": seed,
+                **(extra_body or {}),
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        response = provider_helper.get_response(response, request_parameters)
+        return response
+
+    def text_to_speech(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        do_sample: Optional[bool] = None,
+        early_stopping: Optional[Union[bool, "TextToSpeechEarlyStoppingEnum"]] = None,
+        epsilon_cutoff: Optional[float] = None,
+        eta_cutoff: Optional[float] = None,
+        max_length: Optional[int] = None,
+        max_new_tokens: Optional[int] = None,
+        min_length: Optional[int] = None,
+        min_new_tokens: Optional[int] = None,
+        num_beam_groups: Optional[int] = None,
+        num_beams: Optional[int] = None,
+        penalty_alpha: Optional[float] = None,
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_p: Optional[float] = None,
+        typical_p: Optional[float] = None,
+        use_cache: Optional[bool] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+    ) -> bytes:
+        """
+        Synthesize an audio of a voice pronouncing a given text.
+
+        > [!TIP]
+        > You can pass provider-specific parameters to the model by using the `extra_body` argument.
+
+        Args:
+            text (`str`):
+                The text to synthesize.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended text-to-speech model will be used.
+                Defaults to None.
+            do_sample (`bool`, *optional*):
+                Whether to use sampling instead of greedy decoding when generating new tokens.
+            early_stopping (`Union[bool, "TextToSpeechEarlyStoppingEnum"]`, *optional*):
+                Controls the stopping condition for beam-based methods.
+            epsilon_cutoff (`float`, *optional*):
+                If set to float strictly between 0 and 1, only tokens with a conditional probability greater than
+                epsilon_cutoff will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on
+                the size of the model. See [Truncation Sampling as Language Model
+                Desmoothing](https://hf.co/papers/2210.15191) for more details.
+            eta_cutoff (`float`, *optional*):
+                Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly
+                between 0 and 1, a token is only considered if it is greater than either eta_cutoff or sqrt(eta_cutoff)
+                * exp(-entropy(softmax(next_token_logits))). The latter term is intuitively the expected next token
+                probability, scaled by sqrt(eta_cutoff). In the paper, suggested values range from 3e-4 to 2e-3,
+                depending on the size of the model. See [Truncation Sampling as Language Model
+                Desmoothing](https://hf.co/papers/2210.15191) for more details.
+            max_length (`int`, *optional*):
+                The maximum length (in tokens) of the generated text, including the input.
+            max_new_tokens (`int`, *optional*):
+                The maximum number of tokens to generate. Takes precedence over max_length.
+            min_length (`int`, *optional*):
+                The minimum length (in tokens) of the generated text, including the input.
+            min_new_tokens (`int`, *optional*):
+                The minimum number of tokens to generate. Takes precedence over min_length.
+            num_beam_groups (`int`, *optional*):
+                Number of groups to divide num_beams into in order to ensure diversity among different groups of beams.
+                See [this paper](https://hf.co/papers/1610.02424) for more details.
+            num_beams (`int`, *optional*):
+                Number of beams to use for beam search.
+            penalty_alpha (`float`, *optional*):
+                The value balances the model confidence and the degeneration penalty in contrastive search decoding.
+            temperature (`float`, *optional*):
+                The value used to modulate the next token probabilities.
+            top_k (`int`, *optional*):
+                The number of highest probability vocabulary tokens to keep for top-k-filtering.
+            top_p (`float`, *optional*):
+                If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to
+                top_p or higher are kept for generation.
+            typical_p (`float`, *optional*):
+                Local typicality measures how similar the conditional probability of predicting a target token next is
+                to the expected conditional probability of predicting a random token next, given the partial text
+                already generated. If set to float < 1, the smallest set of the most locally typical tokens with
+                probabilities that add up to typical_p or higher are kept for generation. See [this
+                paper](https://hf.co/papers/2202.00666) for more details.
+            use_cache (`bool`, *optional*):
+                Whether the model should use the past last key/values attentions to speed up decoding
+            extra_body (`Dict[str, Any]`, *optional*):
+                Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
+                for supported parameters.
+        Returns:
+            `bytes`: The generated audio.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from pathlib import Path
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+
+        >>> audio = client.text_to_speech("Hello world")
+        >>> Path("hello_world.flac").write_bytes(audio)
+        ```
+
+        Example using a third-party provider directly. Usage will be billed on your Replicate account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",
+        ...     api_key="your-replicate-api-key",  # Pass your Replicate API key directly
+        ... )
+        >>> audio = client.text_to_speech(
+        ...     text="Hello world",
+        ...     model="OuteAI/OuteTTS-0.3-500M",
+        ... )
+        >>> Path("hello_world.flac").write_bytes(audio)
+        ```
+
+        Example using a third-party provider through Hugging Face Routing. Usage will be billed on your Hugging Face account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> audio =client.text_to_speech(
+        ...     text="Hello world",
+        ...     model="OuteAI/OuteTTS-0.3-500M",
+        ... )
+        >>> Path("hello_world.flac").write_bytes(audio)
+        ```
+        Example using Replicate provider with extra parameters
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",  # Use replicate provider
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> audio = client.text_to_speech(
+        ...     "Hello, my name is Kororo, an awesome text-to-speech model.",
+        ...     model="hexgrad/Kokoro-82M",
+        ...     extra_body={"voice": "af_nicole"},
+        ... )
+        >>> Path("hello.flac").write_bytes(audio)
+        ```
+
+        Example music-gen using "YuE-s1-7B-anneal-en-cot" on fal.ai
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> lyrics = '''
+        ... [verse]
+        ... In the town where I was born
+        ... Lived a man who sailed to sea
+        ... And he told us of his life
+        ... In the land of submarines
+        ... So we sailed on to the sun
+        ... 'Til we found a sea of green
+        ... And we lived beneath the waves
+        ... In our yellow submarine
+
+        ... [chorus]
+        ... We all live in a yellow submarine
+        ... Yellow submarine, yellow submarine
+        ... We all live in a yellow submarine
+        ... Yellow submarine, yellow submarine
+        ... '''
+        >>> genres = "pavarotti-style tenor voice"
+        >>> client = InferenceClient(
+        ...     provider="fal-ai",
+        ...     model="m-a-p/YuE-s1-7B-anneal-en-cot",
+        ...     api_key=...,
+        ... )
+        >>> audio = client.text_to_speech(lyrics, extra_body={"genres": genres})
+        >>> with open("output.mp3", "wb") as f:
+        ...     f.write(audio)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="text-to-speech", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "do_sample": do_sample,
+                "early_stopping": early_stopping,
+                "epsilon_cutoff": epsilon_cutoff,
+                "eta_cutoff": eta_cutoff,
+                "max_length": max_length,
+                "max_new_tokens": max_new_tokens,
+                "min_length": min_length,
+                "min_new_tokens": min_new_tokens,
+                "num_beam_groups": num_beam_groups,
+                "num_beams": num_beams,
+                "penalty_alpha": penalty_alpha,
+                "temperature": temperature,
+                "top_k": top_k,
+                "top_p": top_p,
+                "typical_p": typical_p,
+                "use_cache": use_cache,
+                **(extra_body or {}),
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        response = provider_helper.get_response(response)
+        return response
+
+    def token_classification(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        aggregation_strategy: Optional["TokenClassificationAggregationStrategy"] = None,
+        ignore_labels: Optional[List[str]] = None,
+        stride: Optional[int] = None,
+    ) -> List[TokenClassificationOutputElement]:
+        """
+        Perform token classification on the given text.
+        Usually used for sentence parsing, either grammatical, or Named Entity Recognition (NER) to understand keywords contained within text.
+
+        Args:
+            text (`str`):
+                A string to be classified.
+            model (`str`, *optional*):
+                The model to use for the token classification task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended token classification model will be used.
+                Defaults to None.
+            aggregation_strategy (`"TokenClassificationAggregationStrategy"`, *optional*):
+                The strategy used to fuse tokens based on model predictions
+            ignore_labels (`List[str`, *optional*):
+                A list of labels to ignore
+            stride (`int`, *optional*):
+                The number of overlapping tokens between chunks when splitting the input text.
+
+        Returns:
+            `List[TokenClassificationOutputElement]`: List of [`TokenClassificationOutputElement`] items containing the entity group, confidence score, word, start and end index.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.token_classification("My name is Sarah Jessica Parker but you can call me Jessica")
+        [
+            TokenClassificationOutputElement(
+                entity_group='PER',
+                score=0.9971321225166321,
+                word='Sarah Jessica Parker',
+                start=11,
+                end=31,
+            ),
+            TokenClassificationOutputElement(
+                entity_group='PER',
+                score=0.9773476123809814,
+                word='Jessica',
+                start=52,
+                end=59,
+            )
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="token-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "aggregation_strategy": aggregation_strategy,
+                "ignore_labels": ignore_labels,
+                "stride": stride,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return TokenClassificationOutputElement.parse_obj_as_list(response)
+
+    def translation(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        src_lang: Optional[str] = None,
+        tgt_lang: Optional[str] = None,
+        clean_up_tokenization_spaces: Optional[bool] = None,
+        truncation: Optional["TranslationTruncationStrategy"] = None,
+        generate_parameters: Optional[Dict[str, Any]] = None,
+    ) -> TranslationOutput:
+        """
+        Convert text from one language to another.
+
+        Check out https://huggingface.co/tasks/translation for more information on how to choose the best model for
+        your specific use case. Source and target languages usually depend on the model.
+        However, it is possible to specify source and target languages for certain models. If you are working with one of these models,
+        you can use `src_lang` and `tgt_lang` arguments to pass the relevant information.
+
+        Args:
+            text (`str`):
+                A string to be translated.
+            model (`str`, *optional*):
+                The model to use for the translation task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended translation model will be used.
+                Defaults to None.
+            src_lang (`str`, *optional*):
+                The source language of the text. Required for models that can translate from multiple languages.
+            tgt_lang (`str`, *optional*):
+                Target language to translate to. Required for models that can translate to multiple languages.
+            clean_up_tokenization_spaces (`bool`, *optional*):
+                Whether to clean up the potential extra spaces in the text output.
+            truncation (`"TranslationTruncationStrategy"`, *optional*):
+                The truncation strategy to use.
+            generate_parameters (`Dict[str, Any]`, *optional*):
+                Additional parametrization of the text generation algorithm.
+
+        Returns:
+            [`TranslationOutput`]: The generated translated text.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+            `ValueError`:
+                If only one of the `src_lang` and `tgt_lang` arguments are provided.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.translation("My name is Wolfgang and I live in Berlin")
+        'Mein Name ist Wolfgang und ich lebe in Berlin.'
+        >>> client.translation("My name is Wolfgang and I live in Berlin", model="Helsinki-NLP/opus-mt-en-fr")
+        TranslationOutput(translation_text='Je m'appelle Wolfgang et je vis à Berlin.')
+        ```
+
+        Specifying languages:
+        ```py
+        >>> client.translation("My name is Sarah Jessica Parker but you can call me Jessica", model="facebook/mbart-large-50-many-to-many-mmt", src_lang="en_XX", tgt_lang="fr_XX")
+        "Mon nom est Sarah Jessica Parker mais vous pouvez m'appeler Jessica"
+        ```
+        """
+        # Throw error if only one of `src_lang` and `tgt_lang` was given
+        if src_lang is not None and tgt_lang is None:
+            raise ValueError("You cannot specify `src_lang` without specifying `tgt_lang`.")
+
+        if src_lang is None and tgt_lang is not None:
+            raise ValueError("You cannot specify `tgt_lang` without specifying `src_lang`.")
+
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="translation", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "src_lang": src_lang,
+                "tgt_lang": tgt_lang,
+                "clean_up_tokenization_spaces": clean_up_tokenization_spaces,
+                "truncation": truncation,
+                "generate_parameters": generate_parameters,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return TranslationOutput.parse_obj_as_list(response)[0]
+
+    def visual_question_answering(
+        self,
+        image: ContentT,
+        question: str,
+        *,
+        model: Optional[str] = None,
+        top_k: Optional[int] = None,
+    ) -> List[VisualQuestionAnsweringOutputElement]:
+        """
+        Answering open-ended questions based on an image.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The input image for the context. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            question (`str`):
+                Question to be answered.
+            model (`str`, *optional*):
+                The model to use for the visual question answering task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended visual question answering model will be used.
+                Defaults to None.
+            top_k (`int`, *optional*):
+                The number of answers to return (will be chosen by order of likelihood). Note that we return less than
+                topk answers if there are not enough options available within the context.
+        Returns:
+            `List[VisualQuestionAnsweringOutputElement]`: a list of [`VisualQuestionAnsweringOutputElement`] items containing the predicted label and associated probability.
+
+        Raises:
+            `InferenceTimeoutError`:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.visual_question_answering(
+        ...     image="https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg",
+        ...     question="What is the animal doing?"
+        ... )
+        [
+            VisualQuestionAnsweringOutputElement(score=0.778609573841095, answer='laying down'),
+            VisualQuestionAnsweringOutputElement(score=0.6957435607910156, answer='sitting'),
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="visual-question-answering", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={"top_k": top_k},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+            extra_payload={"question": question, "image": _b64_encode(image)},
+        )
+        response = self._inner_post(request_parameters)
+        return VisualQuestionAnsweringOutputElement.parse_obj_as_list(response)
+
+    def zero_shot_classification(
+        self,
+        text: str,
+        candidate_labels: List[str],
+        *,
+        multi_label: Optional[bool] = False,
+        hypothesis_template: Optional[str] = None,
+        model: Optional[str] = None,
+    ) -> List[ZeroShotClassificationOutputElement]:
+        """
+        Provide as input a text and a set of candidate labels to classify the input text.
+
+        Args:
+            text (`str`):
+                The input text to classify.
+            candidate_labels (`List[str]`):
+                The set of possible class labels to classify the text into.
+            labels (`List[str]`, *optional*):
+                (deprecated) List of strings. Each string is the verbalization of a possible label for the input text.
+            multi_label (`bool`, *optional*):
+                Whether multiple candidate labels can be true. If false, the scores are normalized such that the sum of
+                the label likelihoods for each sequence is 1. If true, the labels are considered independent and
+                probabilities are normalized for each candidate.
+            hypothesis_template (`str`, *optional*):
+                The sentence used in conjunction with `candidate_labels` to attempt the text classification by
+                replacing the placeholder with the candidate labels.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. If not provided, the default recommended zero-shot classification model will be used.
+
+
+        Returns:
+            `List[ZeroShotClassificationOutputElement]`: List of [`ZeroShotClassificationOutputElement`] items containing the predicted labels and their confidence.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example with `multi_label=False`:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> text = (
+        ...     "A new model offers an explanation for how the Galilean satellites formed around the solar system's"
+        ...     "largest world. Konstantin Batygin did not set out to solve one of the solar system's most puzzling"
+        ...     " mysteries when he went for a run up a hill in Nice, France."
+        ... )
+        >>> labels = ["space & cosmos", "scientific discovery", "microbiology", "robots", "archeology"]
+        >>> client.zero_shot_classification(text, labels)
+        [
+            ZeroShotClassificationOutputElement(label='scientific discovery', score=0.7961668968200684),
+            ZeroShotClassificationOutputElement(label='space & cosmos', score=0.18570658564567566),
+            ZeroShotClassificationOutputElement(label='microbiology', score=0.00730885099619627),
+            ZeroShotClassificationOutputElement(label='archeology', score=0.006258360575884581),
+            ZeroShotClassificationOutputElement(label='robots', score=0.004559356719255447),
+        ]
+        >>> client.zero_shot_classification(text, labels, multi_label=True)
+        [
+            ZeroShotClassificationOutputElement(label='scientific discovery', score=0.9829297661781311),
+            ZeroShotClassificationOutputElement(label='space & cosmos', score=0.755190908908844),
+            ZeroShotClassificationOutputElement(label='microbiology', score=0.0005462635890580714),
+            ZeroShotClassificationOutputElement(label='archeology', score=0.00047131875180639327),
+            ZeroShotClassificationOutputElement(label='robots', score=0.00030448526376858354),
+        ]
+        ```
+
+        Example with `multi_label=True` and a custom `hypothesis_template`:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+        >>> client.zero_shot_classification(
+        ...    text="I really like our dinner and I'm very happy. I don't like the weather though.",
+        ...    labels=["positive", "negative", "pessimistic", "optimistic"],
+        ...    multi_label=True,
+        ...    hypothesis_template="This text is {} towards the weather"
+        ... )
+        [
+            ZeroShotClassificationOutputElement(label='negative', score=0.9231801629066467),
+            ZeroShotClassificationOutputElement(label='pessimistic', score=0.8760990500450134),
+            ZeroShotClassificationOutputElement(label='optimistic', score=0.0008674879791215062),
+            ZeroShotClassificationOutputElement(label='positive', score=0.0005250611575320363)
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="zero-shot-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "candidate_labels": candidate_labels,
+                "multi_label": multi_label,
+                "hypothesis_template": hypothesis_template,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        output = _bytes_to_dict(response)
+        return [
+            ZeroShotClassificationOutputElement.parse_obj_as_instance({"label": label, "score": score})
+            for label, score in zip(output["labels"], output["scores"])
+        ]
+
+    def zero_shot_image_classification(
+        self,
+        image: ContentT,
+        candidate_labels: List[str],
+        *,
+        model: Optional[str] = None,
+        hypothesis_template: Optional[str] = None,
+        # deprecated argument
+        labels: List[str] = None,  # type: ignore
+    ) -> List[ZeroShotImageClassificationOutputElement]:
+        """
+        Provide input image and text labels to predict text labels for the image.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The input image to caption. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            candidate_labels (`List[str]`):
+                The candidate labels for this image
+            labels (`List[str]`, *optional*):
+                (deprecated) List of string possible labels. There must be at least 2 labels.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. If not provided, the default recommended zero-shot image classification model will be used.
+            hypothesis_template (`str`, *optional*):
+                The sentence used in conjunction with `candidate_labels` to attempt the image classification by
+                replacing the placeholder with the candidate labels.
+
+        Returns:
+            `List[ZeroShotImageClassificationOutputElement]`: List of [`ZeroShotImageClassificationOutputElement`] items containing the predicted labels and their confidence.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `HTTPError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient()
+
+        >>> client.zero_shot_image_classification(
+        ...     "https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg",
+        ...     labels=["dog", "cat", "horse"],
+        ... )
+        [ZeroShotImageClassificationOutputElement(label='dog', score=0.956),...]
+        ```
+        """
+        # Raise ValueError if input is less than 2 labels
+        if len(candidate_labels) < 2:
+            raise ValueError("You must specify at least 2 classes to compare.")
+
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="zero-shot-image-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={
+                "candidate_labels": candidate_labels,
+                "hypothesis_template": hypothesis_template,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = self._inner_post(request_parameters)
+        return ZeroShotImageClassificationOutputElement.parse_obj_as_list(response)
+
+    def get_endpoint_info(self, *, model: Optional[str] = None) -> Dict[str, Any]:
+        """
+        Get information about the deployed endpoint.
+
+        This endpoint is only available on endpoints powered by Text-Generation-Inference (TGI) or Text-Embedding-Inference (TEI).
+        Endpoints powered by `transformers` return an empty payload.
+
+        Args:
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+
+        Returns:
+            `Dict[str, Any]`: Information about the endpoint.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
+        >>> client.get_endpoint_info()
+        {
+            'model_id': 'meta-llama/Meta-Llama-3-70B-Instruct',
+            'model_sha': None,
+            'model_dtype': 'torch.float16',
+            'model_device_type': 'cuda',
+            'model_pipeline_tag': None,
+            'max_concurrent_requests': 128,
+            'max_best_of': 2,
+            'max_stop_sequences': 4,
+            'max_input_length': 8191,
+            'max_total_tokens': 8192,
+            'waiting_served_ratio': 0.3,
+            'max_batch_total_tokens': 1259392,
+            'max_waiting_tokens': 20,
+            'max_batch_size': None,
+            'validation_workers': 32,
+            'max_client_batch_size': 4,
+            'version': '2.0.2',
+            'sha': 'dccab72549635c7eb5ddb17f43f0b7cdff07c214',
+            'docker_label': 'sha-dccab72'
+        }
+        ```
+        """
+        if self.provider != "hf-inference":
+            raise ValueError(f"Getting endpoint info is not supported on '{self.provider}'.")
+
+        model = model or self.model
+        if model is None:
+            raise ValueError("Model id not provided.")
+        if model.startswith(("http://", "https://")):
+            url = model.rstrip("/") + "/info"
+        else:
+            url = f"{constants.INFERENCE_ENDPOINT}/models/{model}/info"
+
+        response = get_session().get(url, headers=build_hf_headers(token=self.token))
+        hf_raise_for_status(response)
+        return response.json()
+
+    def health_check(self, model: Optional[str] = None) -> bool:
+        """
+        Check the health of the deployed endpoint.
+
+        Health check is only available with Inference Endpoints powered by Text-Generation-Inference (TGI) or Text-Embedding-Inference (TEI).
+
+        Args:
+            model (`str`, *optional*):
+                URL of the Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+
+        Returns:
+            `bool`: True if everything is working fine.
+
+        Example:
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient("https://jzgu0buei5.us-east-1.aws.endpoints.huggingface.cloud")
+        >>> client.health_check()
+        True
+        ```
+        """
+        if self.provider != "hf-inference":
+            raise ValueError(f"Health check is not supported on '{self.provider}'.")
+
+        model = model or self.model
+        if model is None:
+            raise ValueError("Model id not provided.")
+        if not model.startswith(("http://", "https://")):
+            raise ValueError("Model must be an Inference Endpoint URL.")
+        url = model.rstrip("/") + "/health"
+
+        response = get_session().get(url, headers=build_hf_headers(token=self.token))
+        return response.status_code == 200
+
+    @property
+    def chat(self) -> "ProxyClientChat":
+        return ProxyClientChat(self)
+
+
+class _ProxyClient:
+    """Proxy class to be able to call `client.chat.completion.create(...)` as OpenAI client."""
+
+    def __init__(self, client: InferenceClient):
+        self._client = client
+
+
+class ProxyClientChat(_ProxyClient):
+    """Proxy class to be able to call `client.chat.completion.create(...)` as OpenAI client."""
+
+    @property
+    def completions(self) -> "ProxyClientChatCompletions":
+        return ProxyClientChatCompletions(self._client)
+
+
+class ProxyClientChatCompletions(_ProxyClient):
+    """Proxy class to be able to call `client.chat.completion.create(...)` as OpenAI client."""
+
+    @property
+    def create(self):
+        return self._client.chat_completion
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_common.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_common.py
new file mode 100644
index 0000000000000000000000000000000000000000..c7803d14eee9161739f25f9fb5914a35469be0ff
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_common.py
@@ -0,0 +1,459 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains utilities used by both the sync and async inference clients."""
+
+import base64
+import io
+import json
+import logging
+import mimetypes
+from dataclasses import dataclass
+from pathlib import Path
+from typing import (
+    TYPE_CHECKING,
+    Any,
+    AsyncIterable,
+    BinaryIO,
+    Dict,
+    Iterable,
+    List,
+    Literal,
+    NoReturn,
+    Optional,
+    Union,
+    overload,
+)
+
+from requests import HTTPError
+
+from huggingface_hub.errors import (
+    GenerationError,
+    IncompleteGenerationError,
+    OverloadedError,
+    TextGenerationError,
+    UnknownError,
+    ValidationError,
+)
+
+from ..utils import get_session, is_aiohttp_available, is_numpy_available, is_pillow_available
+from ._generated.types import ChatCompletionStreamOutput, TextGenerationStreamOutput
+
+
+if TYPE_CHECKING:
+    from aiohttp import ClientResponse, ClientSession
+    from PIL.Image import Image
+
+# TYPES
+UrlT = str
+PathT = Union[str, Path]
+ContentT = Union[bytes, BinaryIO, PathT, UrlT, "Image", bytearray, memoryview]
+
+# Use to set a Accept: image/png header
+TASKS_EXPECTING_IMAGES = {"text-to-image", "image-to-image"}
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class RequestParameters:
+    url: str
+    task: str
+    model: Optional[str]
+    json: Optional[Union[str, Dict, List]]
+    data: Optional[bytes]
+    headers: Dict[str, Any]
+
+
+class MimeBytes(bytes):
+    """
+    A bytes object with a mime type.
+    To be returned by `_prepare_payload_open_as_mime_bytes` in subclasses.
+
+    Example:
+    ```python
+        >>> b = MimeBytes(b"hello", "text/plain")
+        >>> isinstance(b, bytes)
+        True
+        >>> b.mime_type
+        'text/plain'
+    ```
+    """
+
+    mime_type: Optional[str]
+
+    def __new__(cls, data: bytes, mime_type: Optional[str] = None):
+        obj = super().__new__(cls, data)
+        obj.mime_type = mime_type
+        if isinstance(data, MimeBytes) and mime_type is None:
+            obj.mime_type = data.mime_type
+        return obj
+
+
+## IMPORT UTILS
+
+
+def _import_aiohttp():
+    # Make sure `aiohttp` is installed on the machine.
+    if not is_aiohttp_available():
+        raise ImportError("Please install aiohttp to use `AsyncInferenceClient` (`pip install aiohttp`).")
+    import aiohttp
+
+    return aiohttp
+
+
+def _import_numpy():
+    """Make sure `numpy` is installed on the machine."""
+    if not is_numpy_available():
+        raise ImportError("Please install numpy to use deal with embeddings (`pip install numpy`).")
+    import numpy
+
+    return numpy
+
+
+def _import_pil_image():
+    """Make sure `PIL` is installed on the machine."""
+    if not is_pillow_available():
+        raise ImportError(
+            "Please install Pillow to use deal with images (`pip install Pillow`). If you don't want the image to be"
+            " post-processed, use `client.post(...)` and get the raw response from the server."
+        )
+    from PIL import Image
+
+    return Image
+
+
+## ENCODING / DECODING UTILS
+
+
+@overload
+def _open_as_mime_bytes(content: ContentT) -> MimeBytes: ...  # means "if input is not None, output is not None"
+
+
+@overload
+def _open_as_mime_bytes(content: Literal[None]) -> Literal[None]: ...  # means "if input is None, output is None"
+
+
+def _open_as_mime_bytes(content: Optional[ContentT]) -> Optional[MimeBytes]:
+    """Open `content` as a binary file, either from a URL, a local path, raw bytes, or a PIL Image.
+
+    Do nothing if `content` is None.
+    """
+    # If content is None, yield None
+    if content is None:
+        return None
+
+    # If content is bytes, return it
+    if isinstance(content, bytes):
+        return MimeBytes(content)
+
+    # If content is raw binary data (bytearray, memoryview)
+    if isinstance(content, (bytearray, memoryview)):
+        return MimeBytes(bytes(content))
+
+    # If content is a binary file-like object
+    if hasattr(content, "read"):  # duck-typing instead of isinstance(content, BinaryIO)
+        logger.debug("Reading content from BinaryIO")
+        data = content.read()
+        mime_type = mimetypes.guess_type(content.name)[0] if hasattr(content, "name") else None
+        if isinstance(data, str):
+            raise TypeError("Expected binary stream (bytes), but got text stream")
+        return MimeBytes(data, mime_type=mime_type)
+
+    # If content is a string => must be either a URL or a path
+    if isinstance(content, str):
+        if content.startswith("https://") or content.startswith("http://"):
+            logger.debug(f"Downloading content from {content}")
+            response = get_session().get(content)
+            mime_type = response.headers.get("Content-Type")
+            if mime_type is None:
+                mime_type = mimetypes.guess_type(content)[0]
+            return MimeBytes(response.content, mime_type=mime_type)
+
+        content = Path(content)
+        if not content.exists():
+            raise FileNotFoundError(
+                f"File not found at {content}. If `data` is a string, it must either be a URL or a path to a local"
+                " file. To pass raw content, please encode it as bytes first."
+            )
+
+    # If content is a Path => open it
+    if isinstance(content, Path):
+        logger.debug(f"Opening content from {content}")
+        return MimeBytes(content.read_bytes(), mime_type=mimetypes.guess_type(content)[0])
+
+    # If content is a PIL Image => convert to bytes
+    if is_pillow_available():
+        from PIL import Image
+
+        if isinstance(content, Image.Image):
+            logger.debug("Converting PIL Image to bytes")
+            buffer = io.BytesIO()
+            format = content.format or "PNG"
+            content.save(buffer, format=format)
+            return MimeBytes(buffer.getvalue(), mime_type=f"image/{format.lower()}")
+
+    # If nothing matched, raise error
+    raise TypeError(
+        f"Unsupported content type: {type(content)}. "
+        "Expected one of: bytes, bytearray, BinaryIO, memoryview, Path, str (URL or file path), or PIL.Image.Image."
+    )
+
+
+def _b64_encode(content: ContentT) -> str:
+    """Encode a raw file (image, audio) into base64. Can be bytes, an opened file, a path or a URL."""
+    raw_bytes = _open_as_mime_bytes(content)
+    return base64.b64encode(raw_bytes).decode()
+
+
+def _as_url(content: ContentT, default_mime_type: str) -> str:
+    if isinstance(content, str) and content.startswith(("http://", "https://", "data:")):
+        return content
+
+    # Convert content to bytes
+    raw_bytes = _open_as_mime_bytes(content)
+
+    # Get MIME type
+    mime_type = raw_bytes.mime_type or default_mime_type
+
+    # Encode content to base64
+    encoded_data = base64.b64encode(raw_bytes).decode()
+
+    # Build data URL
+    return f"data:{mime_type};base64,{encoded_data}"
+
+
+def _b64_to_image(encoded_image: str) -> "Image":
+    """Parse a base64-encoded string into a PIL Image."""
+    Image = _import_pil_image()
+    return Image.open(io.BytesIO(base64.b64decode(encoded_image)))
+
+
+def _bytes_to_list(content: bytes) -> List:
+    """Parse bytes from a Response object into a Python list.
+
+    Expects the response body to be JSON-encoded data.
+
+    NOTE: This is exactly the same implementation as `_bytes_to_dict` and will not complain if the returned data is a
+    dictionary. The only advantage of having both is to help the user (and mypy) understand what kind of data to expect.
+    """
+    return json.loads(content.decode())
+
+
+def _bytes_to_dict(content: bytes) -> Dict:
+    """Parse bytes from a Response object into a Python dictionary.
+
+    Expects the response body to be JSON-encoded data.
+
+    NOTE: This is exactly the same implementation as `_bytes_to_list` and will not complain if the returned data is a
+    list. The only advantage of having both is to help the user (and mypy) understand what kind of data to expect.
+    """
+    return json.loads(content.decode())
+
+
+def _bytes_to_image(content: bytes) -> "Image":
+    """Parse bytes from a Response object into a PIL Image.
+
+    Expects the response body to be raw bytes. To deal with b64 encoded images, use `_b64_to_image` instead.
+    """
+    Image = _import_pil_image()
+    return Image.open(io.BytesIO(content))
+
+
+def _as_dict(response: Union[bytes, Dict]) -> Dict:
+    return json.loads(response) if isinstance(response, bytes) else response
+
+
+## STREAMING UTILS
+
+
+def _stream_text_generation_response(
+    bytes_output_as_lines: Iterable[bytes], details: bool
+) -> Union[Iterable[str], Iterable[TextGenerationStreamOutput]]:
+    """Used in `InferenceClient.text_generation`."""
+    # Parse ServerSentEvents
+    for byte_payload in bytes_output_as_lines:
+        try:
+            output = _format_text_generation_stream_output(byte_payload, details)
+        except StopIteration:
+            break
+        if output is not None:
+            yield output
+
+
+async def _async_stream_text_generation_response(
+    bytes_output_as_lines: AsyncIterable[bytes], details: bool
+) -> Union[AsyncIterable[str], AsyncIterable[TextGenerationStreamOutput]]:
+    """Used in `AsyncInferenceClient.text_generation`."""
+    # Parse ServerSentEvents
+    async for byte_payload in bytes_output_as_lines:
+        try:
+            output = _format_text_generation_stream_output(byte_payload, details)
+        except StopIteration:
+            break
+        if output is not None:
+            yield output
+
+
+def _format_text_generation_stream_output(
+    byte_payload: bytes, details: bool
+) -> Optional[Union[str, TextGenerationStreamOutput]]:
+    if not byte_payload.startswith(b"data:"):
+        return None  # empty line
+
+    if byte_payload.strip() == b"data: [DONE]":
+        raise StopIteration("[DONE] signal received.")
+
+    # Decode payload
+    payload = byte_payload.decode("utf-8")
+    json_payload = json.loads(payload.lstrip("data:").rstrip("/n"))
+
+    # Either an error as being returned
+    if json_payload.get("error") is not None:
+        raise _parse_text_generation_error(json_payload["error"], json_payload.get("error_type"))
+
+    # Or parse token payload
+    output = TextGenerationStreamOutput.parse_obj_as_instance(json_payload)
+    return output.token.text if not details else output
+
+
+def _stream_chat_completion_response(
+    bytes_lines: Iterable[bytes],
+) -> Iterable[ChatCompletionStreamOutput]:
+    """Used in `InferenceClient.chat_completion` if model is served with TGI."""
+    for item in bytes_lines:
+        try:
+            output = _format_chat_completion_stream_output(item)
+        except StopIteration:
+            break
+        if output is not None:
+            yield output
+
+
+async def _async_stream_chat_completion_response(
+    bytes_lines: AsyncIterable[bytes],
+) -> AsyncIterable[ChatCompletionStreamOutput]:
+    """Used in `AsyncInferenceClient.chat_completion`."""
+    async for item in bytes_lines:
+        try:
+            output = _format_chat_completion_stream_output(item)
+        except StopIteration:
+            break
+        if output is not None:
+            yield output
+
+
+def _format_chat_completion_stream_output(
+    byte_payload: bytes,
+) -> Optional[ChatCompletionStreamOutput]:
+    if not byte_payload.startswith(b"data:"):
+        return None  # empty line
+
+    if byte_payload.strip() == b"data: [DONE]":
+        raise StopIteration("[DONE] signal received.")
+
+    # Decode payload
+    payload = byte_payload.decode("utf-8")
+    json_payload = json.loads(payload.lstrip("data:").rstrip("/n"))
+
+    # Either an error as being returned
+    if json_payload.get("error") is not None:
+        raise _parse_text_generation_error(json_payload["error"], json_payload.get("error_type"))
+
+    # Or parse token payload
+    return ChatCompletionStreamOutput.parse_obj_as_instance(json_payload)
+
+
+async def _async_yield_from(client: "ClientSession", response: "ClientResponse") -> AsyncIterable[bytes]:
+    try:
+        async for byte_payload in response.content:
+            yield byte_payload.strip()
+    finally:
+        # Always close the underlying HTTP session to avoid resource leaks
+        await client.close()
+
+
+# "TGI servers" are servers running with the `text-generation-inference` backend.
+# This backend is the go-to solution to run large language models at scale. However,
+# for some smaller models (e.g. "gpt2") the default `transformers` + `api-inference`
+# solution is still in use.
+#
+# Both approaches have very similar APIs, but not exactly the same. What we do first in
+# the `text_generation` method is to assume the model is served via TGI. If we realize
+# it's not the case (i.e. we receive an HTTP 400 Bad Request), we fallback to the
+# default API with a warning message. When that's the case, We remember the unsupported
+# attributes for this model in the `_UNSUPPORTED_TEXT_GENERATION_KWARGS` global variable.
+#
+# In addition, TGI servers have a built-in API route for chat-completion, which is not
+# available on the default API. We use this route to provide a more consistent behavior
+# when available.
+#
+# For more details, see https://github.com/huggingface/text-generation-inference and
+# https://huggingface.co/docs/api-inference/detailed_parameters#text-generation-task.
+
+_UNSUPPORTED_TEXT_GENERATION_KWARGS: Dict[Optional[str], List[str]] = {}
+
+
+def _set_unsupported_text_generation_kwargs(model: Optional[str], unsupported_kwargs: List[str]) -> None:
+    _UNSUPPORTED_TEXT_GENERATION_KWARGS.setdefault(model, []).extend(unsupported_kwargs)
+
+
+def _get_unsupported_text_generation_kwargs(model: Optional[str]) -> List[str]:
+    return _UNSUPPORTED_TEXT_GENERATION_KWARGS.get(model, [])
+
+
+# TEXT GENERATION ERRORS
+# ----------------------
+# Text-generation errors are parsed separately to handle as much as possible the errors returned by the text generation
+# inference project (https://github.com/huggingface/text-generation-inference).
+# ----------------------
+
+
+def raise_text_generation_error(http_error: HTTPError) -> NoReturn:
+    """
+    Try to parse text-generation-inference error message and raise HTTPError in any case.
+
+    Args:
+        error (`HTTPError`):
+            The HTTPError that have been raised.
+    """
+    # Try to parse a Text Generation Inference error
+
+    try:
+        # Hacky way to retrieve payload in case of aiohttp error
+        payload = getattr(http_error, "response_error_payload", None) or http_error.response.json()
+        error = payload.get("error")
+        error_type = payload.get("error_type")
+    except Exception:  # no payload
+        raise http_error
+
+    # If error_type => more information than `hf_raise_for_status`
+    if error_type is not None:
+        exception = _parse_text_generation_error(error, error_type)
+        raise exception from http_error
+
+    # Otherwise, fallback to default error
+    raise http_error
+
+
+def _parse_text_generation_error(error: Optional[str], error_type: Optional[str]) -> TextGenerationError:
+    if error_type == "generation":
+        return GenerationError(error)  # type: ignore
+    if error_type == "incomplete_generation":
+        return IncompleteGenerationError(error)  # type: ignore
+    if error_type == "overloaded":
+        return OverloadedError(error)  # type: ignore
+    if error_type == "validation":
+        return ValidationError(error)  # type: ignore
+    return UnknownError(error)  # type: ignore
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..badbe3e2a2817d8c42315996b168df0cfe8716fa
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/_async_client.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/_async_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..45285d8390cb0d8ab1a3b9cc6a0ce0d01f95b6c8
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/_async_client.py
@@ -0,0 +1,3478 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# WARNING
+# This entire file has been adapted from the sync-client code in `src/huggingface_hub/inference/_client.py`.
+# Any change in InferenceClient will be automatically reflected in AsyncInferenceClient.
+# To re-generate the code, run `make style` or `python ./utils/generate_async_inference_client.py --update`.
+# WARNING
+import asyncio
+import base64
+import logging
+import re
+import warnings
+from typing import TYPE_CHECKING, Any, AsyncIterable, Dict, List, Literal, Optional, Set, Union, overload
+
+from huggingface_hub import constants
+from huggingface_hub.errors import InferenceTimeoutError
+from huggingface_hub.inference._common import (
+    TASKS_EXPECTING_IMAGES,
+    ContentT,
+    RequestParameters,
+    _async_stream_chat_completion_response,
+    _async_stream_text_generation_response,
+    _b64_encode,
+    _b64_to_image,
+    _bytes_to_dict,
+    _bytes_to_image,
+    _bytes_to_list,
+    _get_unsupported_text_generation_kwargs,
+    _import_numpy,
+    _set_unsupported_text_generation_kwargs,
+    raise_text_generation_error,
+)
+from huggingface_hub.inference._generated.types import (
+    AudioClassificationOutputElement,
+    AudioClassificationOutputTransform,
+    AudioToAudioOutputElement,
+    AutomaticSpeechRecognitionOutput,
+    ChatCompletionInputGrammarType,
+    ChatCompletionInputMessage,
+    ChatCompletionInputStreamOptions,
+    ChatCompletionInputTool,
+    ChatCompletionInputToolChoiceClass,
+    ChatCompletionInputToolChoiceEnum,
+    ChatCompletionOutput,
+    ChatCompletionStreamOutput,
+    DocumentQuestionAnsweringOutputElement,
+    FillMaskOutputElement,
+    ImageClassificationOutputElement,
+    ImageClassificationOutputTransform,
+    ImageSegmentationOutputElement,
+    ImageSegmentationSubtask,
+    ImageToImageTargetSize,
+    ImageToTextOutput,
+    ImageToVideoTargetSize,
+    ObjectDetectionOutputElement,
+    Padding,
+    QuestionAnsweringOutputElement,
+    SummarizationOutput,
+    SummarizationTruncationStrategy,
+    TableQuestionAnsweringOutputElement,
+    TextClassificationOutputElement,
+    TextClassificationOutputTransform,
+    TextGenerationInputGrammarType,
+    TextGenerationOutput,
+    TextGenerationStreamOutput,
+    TextToSpeechEarlyStoppingEnum,
+    TokenClassificationAggregationStrategy,
+    TokenClassificationOutputElement,
+    TranslationOutput,
+    TranslationTruncationStrategy,
+    VisualQuestionAnsweringOutputElement,
+    ZeroShotClassificationOutputElement,
+    ZeroShotImageClassificationOutputElement,
+)
+from huggingface_hub.inference._providers import PROVIDER_OR_POLICY_T, get_provider_helper
+from huggingface_hub.utils import build_hf_headers
+from huggingface_hub.utils._auth import get_token
+
+from .._common import _async_yield_from, _import_aiohttp
+
+
+if TYPE_CHECKING:
+    import numpy as np
+    from aiohttp import ClientResponse, ClientSession
+    from PIL.Image import Image
+
+logger = logging.getLogger(__name__)
+
+
+MODEL_KWARGS_NOT_USED_REGEX = re.compile(r"The following `model_kwargs` are not used by the model: \[(.*?)\]")
+
+
+class AsyncInferenceClient:
+    """
+    Initialize a new Inference Client.
+
+    [`InferenceClient`] aims to provide a unified experience to perform inference. The client can be used
+    seamlessly with either the (free) Inference API, self-hosted Inference Endpoints, or third-party Inference Providers.
+
+    Args:
+        model (`str`, `optional`):
+            The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. `meta-llama/Meta-Llama-3-8B-Instruct`
+            or a URL to a deployed Inference Endpoint. Defaults to None, in which case a recommended model is
+            automatically selected for the task.
+            Note: for better compatibility with OpenAI's client, `model` has been aliased as `base_url`. Those 2
+            arguments are mutually exclusive. If a URL is passed as `model` or `base_url` for chat completion, the `(/v1)/chat/completions` suffix path will be appended to the URL.
+        provider (`str`, *optional*):
+            Name of the provider to use for inference. Can be `"black-forest-labs"`, `"cerebras"`, `"clarifai"`, `"cohere"`, `"fal-ai"`, `"featherless-ai"`, `"fireworks-ai"`, `"groq"`, `"hf-inference"`, `"hyperbolic"`, `"nebius"`, `"novita"`, `"nscale"`, `"openai"`, `publicai`, `"replicate"`, `"sambanova"`, `"scaleway"`, `"together"` or `"zai-org"`.
+            Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
+            If model is a URL or `base_url` is passed, then `provider` is not used.
+        token (`str`, *optional*):
+            Hugging Face token. Will default to the locally saved token if not provided.
+            Note: for better compatibility with OpenAI's client, `token` has been aliased as `api_key`. Those 2
+            arguments are mutually exclusive and have the exact same behavior.
+        timeout (`float`, `optional`):
+            The maximum number of seconds to wait for a response from the server. Defaults to None, meaning it will loop until the server is available.
+        headers (`Dict[str, str]`, `optional`):
+            Additional headers to send to the server. By default only the authorization and user-agent headers are sent.
+            Values in this dictionary will override the default values.
+        bill_to (`str`, `optional`):
+            The billing account to use for the requests. By default the requests are billed on the user's account.
+            Requests can only be billed to an organization the user is a member of, and which has subscribed to Enterprise Hub.
+        cookies (`Dict[str, str]`, `optional`):
+            Additional cookies to send to the server.
+        trust_env ('bool', 'optional'):
+            Trust environment settings for proxy configuration if the parameter is `True` (`False` by default).
+        proxies (`Any`, `optional`):
+            Proxies to use for the request.
+        base_url (`str`, `optional`):
+            Base URL to run inference. This is a duplicated argument from `model` to make [`InferenceClient`]
+            follow the same pattern as `openai.OpenAI` client. Cannot be used if `model` is set. Defaults to None.
+        api_key (`str`, `optional`):
+            Token to use for authentication. This is a duplicated argument from `token` to make [`InferenceClient`]
+            follow the same pattern as `openai.OpenAI` client. Cannot be used if `token` is set. Defaults to None.
+    """
+
+    def __init__(
+        self,
+        model: Optional[str] = None,
+        *,
+        provider: Optional[PROVIDER_OR_POLICY_T] = None,
+        token: Optional[str] = None,
+        timeout: Optional[float] = None,
+        headers: Optional[Dict[str, str]] = None,
+        cookies: Optional[Dict[str, str]] = None,
+        trust_env: bool = False,
+        proxies: Optional[Any] = None,
+        bill_to: Optional[str] = None,
+        # OpenAI compatibility
+        base_url: Optional[str] = None,
+        api_key: Optional[str] = None,
+    ) -> None:
+        if model is not None and base_url is not None:
+            raise ValueError(
+                "Received both `model` and `base_url` arguments. Please provide only one of them."
+                " `base_url` is an alias for `model` to make the API compatible with OpenAI's client."
+                " If using `base_url` for chat completion, the `/chat/completions` suffix path will be appended to the base url."
+                " When passing a URL as `model`, the client will not append any suffix path to it."
+            )
+        if token is not None and api_key is not None:
+            raise ValueError(
+                "Received both `token` and `api_key` arguments. Please provide only one of them."
+                " `api_key` is an alias for `token` to make the API compatible with OpenAI's client."
+                " It has the exact same behavior as `token`."
+            )
+        token = token if token is not None else api_key
+        if isinstance(token, bool):
+            # Legacy behavior: previously is was possible to pass `token=False` to disable authentication. This is not
+            # supported anymore as authentication is required. Better to explicitly raise here rather than risking
+            # sending the locally saved token without the user knowing about it.
+            if token is False:
+                raise ValueError(
+                    "Cannot use `token=False` to disable authentication as authentication is required to run Inference."
+                )
+            warnings.warn(
+                "Using `token=True` to automatically use the locally saved token is deprecated and will be removed in a future release. "
+                "Please use `token=None` instead (default).",
+                DeprecationWarning,
+            )
+            token = get_token()
+
+        self.model: Optional[str] = base_url or model
+        self.token: Optional[str] = token
+
+        self.headers = {**headers} if headers is not None else {}
+        if bill_to is not None:
+            if (
+                constants.HUGGINGFACE_HEADER_X_BILL_TO in self.headers
+                and self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO] != bill_to
+            ):
+                warnings.warn(
+                    f"Overriding existing '{self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO]}' value in headers with '{bill_to}'.",
+                    UserWarning,
+                )
+            self.headers[constants.HUGGINGFACE_HEADER_X_BILL_TO] = bill_to
+
+            if token is not None and not token.startswith("hf_"):
+                warnings.warn(
+                    "You've provided an external provider's API key, so requests will be billed directly by the provider. "
+                    "The `bill_to` parameter is only applicable for Hugging Face billing and will be ignored.",
+                    UserWarning,
+                )
+
+        # Configure provider
+        self.provider = provider
+
+        self.cookies = cookies
+        self.timeout = timeout
+        self.trust_env = trust_env
+        self.proxies = proxies
+
+        # Keep track of the sessions to close them properly
+        self._sessions: Dict["ClientSession", Set["ClientResponse"]] = dict()
+
+    def __repr__(self):
+        return f"<InferenceClient(model='{self.model if self.model else ''}', timeout={self.timeout})>"
+
+    @overload
+    async def _inner_post(  # type: ignore[misc]
+        self, request_parameters: RequestParameters, *, stream: Literal[False] = ...
+    ) -> bytes: ...
+
+    @overload
+    async def _inner_post(  # type: ignore[misc]
+        self, request_parameters: RequestParameters, *, stream: Literal[True] = ...
+    ) -> AsyncIterable[bytes]: ...
+
+    @overload
+    async def _inner_post(
+        self, request_parameters: RequestParameters, *, stream: bool = False
+    ) -> Union[bytes, AsyncIterable[bytes]]: ...
+
+    async def _inner_post(
+        self, request_parameters: RequestParameters, *, stream: bool = False
+    ) -> Union[bytes, AsyncIterable[bytes]]:
+        """Make a request to the inference server."""
+
+        aiohttp = _import_aiohttp()
+
+        # TODO: this should be handled in provider helpers directly
+        if request_parameters.task in TASKS_EXPECTING_IMAGES and "Accept" not in request_parameters.headers:
+            request_parameters.headers["Accept"] = "image/png"
+
+        # Do not use context manager as we don't want to close the connection immediately when returning
+        # a stream
+        session = self._get_client_session(headers=request_parameters.headers)
+
+        try:
+            response = await session.post(
+                request_parameters.url, json=request_parameters.json, data=request_parameters.data, proxy=self.proxies
+            )
+            response_error_payload = None
+            if response.status != 200:
+                try:
+                    response_error_payload = await response.json()  # get payload before connection closed
+                except Exception:
+                    pass
+            response.raise_for_status()
+            if stream:
+                return _async_yield_from(session, response)
+            else:
+                content = await response.read()
+                await session.close()
+                return content
+        except asyncio.TimeoutError as error:
+            await session.close()
+            # Convert any `TimeoutError` to a `InferenceTimeoutError`
+            raise InferenceTimeoutError(f"Inference call timed out: {request_parameters.url}") from error  # type: ignore
+        except aiohttp.ClientResponseError as error:
+            error.response_error_payload = response_error_payload
+            await session.close()
+            raise error
+        except Exception:
+            await session.close()
+            raise
+
+    async def __aenter__(self):
+        return self
+
+    async def __aexit__(self, exc_type, exc_value, traceback):
+        await self.close()
+
+    def __del__(self):
+        if len(self._sessions) > 0:
+            warnings.warn(
+                "Deleting 'AsyncInferenceClient' client but some sessions are still open. "
+                "This can happen if you've stopped streaming data from the server before the stream was complete. "
+                "To close the client properly, you must call `await client.close()` "
+                "or use an async context (e.g. `async with AsyncInferenceClient(): ...`."
+            )
+
+    async def close(self):
+        """Close all open sessions.
+
+        By default, 'aiohttp.ClientSession' objects are closed automatically when a call is completed. However, if you
+        are streaming data from the server and you stop before the stream is complete, you must call this method to
+        close the session properly.
+
+        Another possibility is to use an async context (e.g. `async with AsyncInferenceClient(): ...`).
+        """
+        await asyncio.gather(*[session.close() for session in self._sessions.keys()])
+
+    async def audio_classification(
+        self,
+        audio: ContentT,
+        *,
+        model: Optional[str] = None,
+        top_k: Optional[int] = None,
+        function_to_apply: Optional["AudioClassificationOutputTransform"] = None,
+    ) -> List[AudioClassificationOutputElement]:
+        """
+        Perform audio classification on the provided audio content.
+
+        Args:
+            audio (Union[str, Path, bytes, BinaryIO]):
+                The audio content to classify. It can be raw audio bytes, a local audio file, or a URL pointing to an
+                audio file.
+            model (`str`, *optional*):
+                The model to use for audio classification. Can be a model ID hosted on the Hugging Face Hub
+                or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for
+                audio classification will be used.
+            top_k (`int`, *optional*):
+                When specified, limits the output to the top K most probable classes.
+            function_to_apply (`"AudioClassificationOutputTransform"`, *optional*):
+                The function to apply to the model outputs in order to retrieve the scores.
+
+        Returns:
+            `List[AudioClassificationOutputElement]`: List of [`AudioClassificationOutputElement`] items containing the predicted labels and their confidence.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.audio_classification("audio.flac")
+        [
+            AudioClassificationOutputElement(score=0.4976358711719513, label='hap'),
+            AudioClassificationOutputElement(score=0.3677836060523987, label='neu'),
+            ...
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="audio-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=audio,
+            parameters={"function_to_apply": function_to_apply, "top_k": top_k},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return AudioClassificationOutputElement.parse_obj_as_list(response)
+
+    async def audio_to_audio(
+        self,
+        audio: ContentT,
+        *,
+        model: Optional[str] = None,
+    ) -> List[AudioToAudioOutputElement]:
+        """
+        Performs multiple tasks related to audio-to-audio depending on the model (eg: speech enhancement, source separation).
+
+        Args:
+            audio (Union[str, Path, bytes, BinaryIO]):
+                The audio content for the model. It can be raw audio bytes, a local audio file, or a URL pointing to an
+                audio file.
+            model (`str`, *optional*):
+                The model can be any model which takes an audio file and returns another audio file. Can be a model ID hosted on the Hugging Face Hub
+                or a URL to a deployed Inference Endpoint. If not provided, the default recommended model for
+                audio_to_audio will be used.
+
+        Returns:
+            `List[AudioToAudioOutputElement]`: A list of [`AudioToAudioOutputElement`] items containing audios label, content-type, and audio content in blob.
+
+        Raises:
+            `InferenceTimeoutError`:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> audio_output = await client.audio_to_audio("audio.flac")
+        >>> async for i, item in enumerate(audio_output):
+        >>>     with open(f"output_{i}.flac", "wb") as f:
+                    f.write(item.blob)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="audio-to-audio", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=audio,
+            parameters={},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        audio_output = AudioToAudioOutputElement.parse_obj_as_list(response)
+        for item in audio_output:
+            item.blob = base64.b64decode(item.blob)
+        return audio_output
+
+    async def automatic_speech_recognition(
+        self,
+        audio: ContentT,
+        *,
+        model: Optional[str] = None,
+        extra_body: Optional[Dict] = None,
+    ) -> AutomaticSpeechRecognitionOutput:
+        """
+        Perform automatic speech recognition (ASR or audio-to-text) on the given audio content.
+
+        Args:
+            audio (Union[str, Path, bytes, BinaryIO]):
+                The content to transcribe. It can be raw audio bytes, local audio file, or a URL to an audio file.
+            model (`str`, *optional*):
+                The model to use for ASR. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended model for ASR will be used.
+            extra_body (`Dict`, *optional*):
+                Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
+                for supported parameters.
+        Returns:
+            [`AutomaticSpeechRecognitionOutput`]: An item containing the transcribed text and optionally the timestamp chunks.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.automatic_speech_recognition("hello_world.flac").text
+        "hello world"
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="automatic-speech-recognition", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=audio,
+            parameters={**(extra_body or {})},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return AutomaticSpeechRecognitionOutput.parse_obj_as_instance(response)
+
+    @overload
+    async def chat_completion(  # type: ignore
+        self,
+        messages: List[Union[Dict, ChatCompletionInputMessage]],
+        *,
+        model: Optional[str] = None,
+        stream: Literal[False] = False,
+        frequency_penalty: Optional[float] = None,
+        logit_bias: Optional[List[float]] = None,
+        logprobs: Optional[bool] = None,
+        max_tokens: Optional[int] = None,
+        n: Optional[int] = None,
+        presence_penalty: Optional[float] = None,
+        response_format: Optional[ChatCompletionInputGrammarType] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stream_options: Optional[ChatCompletionInputStreamOptions] = None,
+        temperature: Optional[float] = None,
+        tool_choice: Optional[Union[ChatCompletionInputToolChoiceClass, "ChatCompletionInputToolChoiceEnum"]] = None,
+        tool_prompt: Optional[str] = None,
+        tools: Optional[List[ChatCompletionInputTool]] = None,
+        top_logprobs: Optional[int] = None,
+        top_p: Optional[float] = None,
+        extra_body: Optional[Dict] = None,
+    ) -> ChatCompletionOutput: ...
+
+    @overload
+    async def chat_completion(  # type: ignore
+        self,
+        messages: List[Union[Dict, ChatCompletionInputMessage]],
+        *,
+        model: Optional[str] = None,
+        stream: Literal[True] = True,
+        frequency_penalty: Optional[float] = None,
+        logit_bias: Optional[List[float]] = None,
+        logprobs: Optional[bool] = None,
+        max_tokens: Optional[int] = None,
+        n: Optional[int] = None,
+        presence_penalty: Optional[float] = None,
+        response_format: Optional[ChatCompletionInputGrammarType] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stream_options: Optional[ChatCompletionInputStreamOptions] = None,
+        temperature: Optional[float] = None,
+        tool_choice: Optional[Union[ChatCompletionInputToolChoiceClass, "ChatCompletionInputToolChoiceEnum"]] = None,
+        tool_prompt: Optional[str] = None,
+        tools: Optional[List[ChatCompletionInputTool]] = None,
+        top_logprobs: Optional[int] = None,
+        top_p: Optional[float] = None,
+        extra_body: Optional[Dict] = None,
+    ) -> AsyncIterable[ChatCompletionStreamOutput]: ...
+
+    @overload
+    async def chat_completion(
+        self,
+        messages: List[Union[Dict, ChatCompletionInputMessage]],
+        *,
+        model: Optional[str] = None,
+        stream: bool = False,
+        frequency_penalty: Optional[float] = None,
+        logit_bias: Optional[List[float]] = None,
+        logprobs: Optional[bool] = None,
+        max_tokens: Optional[int] = None,
+        n: Optional[int] = None,
+        presence_penalty: Optional[float] = None,
+        response_format: Optional[ChatCompletionInputGrammarType] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stream_options: Optional[ChatCompletionInputStreamOptions] = None,
+        temperature: Optional[float] = None,
+        tool_choice: Optional[Union[ChatCompletionInputToolChoiceClass, "ChatCompletionInputToolChoiceEnum"]] = None,
+        tool_prompt: Optional[str] = None,
+        tools: Optional[List[ChatCompletionInputTool]] = None,
+        top_logprobs: Optional[int] = None,
+        top_p: Optional[float] = None,
+        extra_body: Optional[Dict] = None,
+    ) -> Union[ChatCompletionOutput, AsyncIterable[ChatCompletionStreamOutput]]: ...
+
+    async def chat_completion(
+        self,
+        messages: List[Union[Dict, ChatCompletionInputMessage]],
+        *,
+        model: Optional[str] = None,
+        stream: bool = False,
+        # Parameters from ChatCompletionInput (handled manually)
+        frequency_penalty: Optional[float] = None,
+        logit_bias: Optional[List[float]] = None,
+        logprobs: Optional[bool] = None,
+        max_tokens: Optional[int] = None,
+        n: Optional[int] = None,
+        presence_penalty: Optional[float] = None,
+        response_format: Optional[ChatCompletionInputGrammarType] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stream_options: Optional[ChatCompletionInputStreamOptions] = None,
+        temperature: Optional[float] = None,
+        tool_choice: Optional[Union[ChatCompletionInputToolChoiceClass, "ChatCompletionInputToolChoiceEnum"]] = None,
+        tool_prompt: Optional[str] = None,
+        tools: Optional[List[ChatCompletionInputTool]] = None,
+        top_logprobs: Optional[int] = None,
+        top_p: Optional[float] = None,
+        extra_body: Optional[Dict] = None,
+    ) -> Union[ChatCompletionOutput, AsyncIterable[ChatCompletionStreamOutput]]:
+        """
+        A method for completing conversations using a specified language model.
+
+        > [!TIP]
+        > The `client.chat_completion` method is aliased as `client.chat.completions.create` for compatibility with OpenAI's client.
+        > Inputs and outputs are strictly the same and using either syntax will yield the same results.
+        > Check out the [Inference guide](https://huggingface.co/docs/huggingface_hub/guides/inference#openai-compatibility)
+        > for more details about OpenAI's compatibility.
+
+        > [!TIP]
+        > You can pass provider-specific parameters to the model by using the `extra_body` argument.
+
+        Args:
+            messages (List of [`ChatCompletionInputMessage`]):
+                Conversation history consisting of roles and content pairs.
+            model (`str`, *optional*):
+                The model to use for chat-completion. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended model for chat-based text-generation will be used.
+                See https://huggingface.co/tasks/text-generation for more details.
+                If `model` is a model ID, it is passed to the server as the `model` parameter. If you want to define a
+                custom URL while setting `model` in the request payload, you must set `base_url` when initializing [`InferenceClient`].
+            frequency_penalty (`float`, *optional*):
+                Penalizes new tokens based on their existing frequency
+                in the text so far. Range: [-2.0, 2.0]. Defaults to 0.0.
+            logit_bias (`List[float]`, *optional*):
+                Adjusts the likelihood of specific tokens appearing in the generated output.
+            logprobs (`bool`, *optional*):
+                Whether to return log probabilities of the output tokens or not. If true, returns the log
+                probabilities of each output token returned in the content of message.
+            max_tokens (`int`, *optional*):
+                Maximum number of tokens allowed in the response. Defaults to 100.
+            n (`int`, *optional*):
+                The number of completions to generate for each prompt.
+            presence_penalty (`float`, *optional*):
+                Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the
+                text so far, increasing the model's likelihood to talk about new topics.
+            response_format ([`ChatCompletionInputGrammarType`], *optional*):
+                Grammar constraints. Can be either a JSONSchema or a regex.
+            seed (Optional[`int`], *optional*):
+                Seed for reproducible control flow. Defaults to None.
+            stop (`List[str]`, *optional*):
+                Up to four strings which trigger the end of the response.
+                Defaults to None.
+            stream (`bool`, *optional*):
+                Enable realtime streaming of responses. Defaults to False.
+            stream_options ([`ChatCompletionInputStreamOptions`], *optional*):
+                Options for streaming completions.
+            temperature (`float`, *optional*):
+                Controls randomness of the generations. Lower values ensure
+                less random completions. Range: [0, 2]. Defaults to 1.0.
+            top_logprobs (`int`, *optional*):
+                An integer between 0 and 5 specifying the number of most likely tokens to return at each token
+                position, each with an associated log probability. logprobs must be set to true if this parameter is
+                used.
+            top_p (`float`, *optional*):
+                Fraction of the most likely next words to sample from.
+                Must be between 0 and 1. Defaults to 1.0.
+            tool_choice ([`ChatCompletionInputToolChoiceClass`] or [`ChatCompletionInputToolChoiceEnum`], *optional*):
+                The tool to use for the completion. Defaults to "auto".
+            tool_prompt (`str`, *optional*):
+                A prompt to be appended before the tools.
+            tools (List of [`ChatCompletionInputTool`], *optional*):
+                A list of tools the model may call. Currently, only functions are supported as a tool. Use this to
+                provide a list of functions the model may generate JSON inputs for.
+            extra_body (`Dict`, *optional*):
+                Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
+                for supported parameters.
+        Returns:
+            [`ChatCompletionOutput`] or Iterable of [`ChatCompletionStreamOutput`]:
+            Generated text returned from the server:
+            - if `stream=False`, the generated text is returned as a [`ChatCompletionOutput`] (default).
+            - if `stream=True`, the generated text is returned token by token as a sequence of [`ChatCompletionStreamOutput`].
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> messages = [{"role": "user", "content": "What is the capital of France?"}]
+        >>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
+        >>> await client.chat_completion(messages, max_tokens=100)
+        ChatCompletionOutput(
+            choices=[
+                ChatCompletionOutputComplete(
+                    finish_reason='eos_token',
+                    index=0,
+                    message=ChatCompletionOutputMessage(
+                        role='assistant',
+                        content='The capital of France is Paris.',
+                        name=None,
+                        tool_calls=None
+                    ),
+                    logprobs=None
+                )
+            ],
+            created=1719907176,
+            id='',
+            model='meta-llama/Meta-Llama-3-8B-Instruct',
+            object='text_completion',
+            system_fingerprint='2.0.4-sha-f426a33',
+            usage=ChatCompletionOutputUsage(
+                completion_tokens=8,
+                prompt_tokens=17,
+                total_tokens=25
+            )
+        )
+        ```
+
+        Example using streaming:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> messages = [{"role": "user", "content": "What is the capital of France?"}]
+        >>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
+        >>> async for token in await client.chat_completion(messages, max_tokens=10, stream=True):
+        ...     print(token)
+        ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content='The', role='assistant'), index=0, finish_reason=None)], created=1710498504)
+        ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' capital', role='assistant'), index=0, finish_reason=None)], created=1710498504)
+        (...)
+        ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' may', role='assistant'), index=0, finish_reason=None)], created=1710498504)
+        ```
+
+        Example using OpenAI's syntax:
+        ```py
+        # Must be run in an async context
+        # instead of `from openai import OpenAI`
+        from huggingface_hub import AsyncInferenceClient
+
+        # instead of `client = OpenAI(...)`
+        client = AsyncInferenceClient(
+            base_url=...,
+            api_key=...,
+        )
+
+        output = await client.chat.completions.create(
+            model="meta-llama/Meta-Llama-3-8B-Instruct",
+            messages=[
+                {"role": "system", "content": "You are a helpful assistant."},
+                {"role": "user", "content": "Count to 10"},
+            ],
+            stream=True,
+            max_tokens=1024,
+        )
+
+        for chunk in output:
+            print(chunk.choices[0].delta.content)
+        ```
+
+        Example using a third-party provider directly with extra (provider-specific) parameters. Usage will be billed on your Together AI account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="together",  # Use Together AI provider
+        ...     api_key="<together_api_key>",  # Pass your Together API key directly
+        ... )
+        >>> client.chat_completion(
+        ...     model="meta-llama/Meta-Llama-3-8B-Instruct",
+        ...     messages=[{"role": "user", "content": "What is the capital of France?"}],
+        ...     extra_body={"safety_model": "Meta-Llama/Llama-Guard-7b"},
+        ... )
+        ```
+
+        Example using a third-party provider through Hugging Face Routing. Usage will be billed on your Hugging Face account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="sambanova",  # Use Sambanova provider
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> client.chat_completion(
+        ...     model="meta-llama/Meta-Llama-3-8B-Instruct",
+        ...     messages=[{"role": "user", "content": "What is the capital of France?"}],
+        ... )
+        ```
+
+        Example using Image + Text as input:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+
+        # provide a remote URL
+        >>> image_url ="https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
+        # or a base64-encoded image
+        >>> image_path = "/path/to/image.jpeg"
+        >>> with open(image_path, "rb") as f:
+        ...     base64_image = base64.b64encode(f.read()).decode("utf-8")
+        >>> image_url = f"data:image/jpeg;base64,{base64_image}"
+
+        >>> client = AsyncInferenceClient("meta-llama/Llama-3.2-11B-Vision-Instruct")
+        >>> output = await client.chat.completions.create(
+        ...     messages=[
+        ...         {
+        ...             "role": "user",
+        ...             "content": [
+        ...                 {
+        ...                     "type": "image_url",
+        ...                     "image_url": {"url": image_url},
+        ...                 },
+        ...                 {
+        ...                     "type": "text",
+        ...                     "text": "Describe this image in one sentence.",
+        ...                 },
+        ...             ],
+        ...         },
+        ...     ],
+        ... )
+        >>> output
+        The image depicts the iconic Statue of Liberty situated in New York Harbor, New York, on a clear day.
+        ```
+
+        Example using tools:
+        ```py
+        # Must be run in an async context
+        >>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
+        >>> messages = [
+        ...     {
+        ...         "role": "system",
+        ...         "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.",
+        ...     },
+        ...     {
+        ...         "role": "user",
+        ...         "content": "What's the weather like the next 3 days in San Francisco, CA?",
+        ...     },
+        ... ]
+        >>> tools = [
+        ...     {
+        ...         "type": "function",
+        ...         "function": {
+        ...             "name": "get_current_weather",
+        ...             "description": "Get the current weather",
+        ...             "parameters": {
+        ...                 "type": "object",
+        ...                 "properties": {
+        ...                     "location": {
+        ...                         "type": "string",
+        ...                         "description": "The city and state, e.g. San Francisco, CA",
+        ...                     },
+        ...                     "format": {
+        ...                         "type": "string",
+        ...                         "enum": ["celsius", "fahrenheit"],
+        ...                         "description": "The temperature unit to use. Infer this from the users location.",
+        ...                     },
+        ...                 },
+        ...                 "required": ["location", "format"],
+        ...             },
+        ...         },
+        ...     },
+        ...     {
+        ...         "type": "function",
+        ...         "function": {
+        ...             "name": "get_n_day_weather_forecast",
+        ...             "description": "Get an N-day weather forecast",
+        ...             "parameters": {
+        ...                 "type": "object",
+        ...                 "properties": {
+        ...                     "location": {
+        ...                         "type": "string",
+        ...                         "description": "The city and state, e.g. San Francisco, CA",
+        ...                     },
+        ...                     "format": {
+        ...                         "type": "string",
+        ...                         "enum": ["celsius", "fahrenheit"],
+        ...                         "description": "The temperature unit to use. Infer this from the users location.",
+        ...                     },
+        ...                     "num_days": {
+        ...                         "type": "integer",
+        ...                         "description": "The number of days to forecast",
+        ...                     },
+        ...                 },
+        ...                 "required": ["location", "format", "num_days"],
+        ...             },
+        ...         },
+        ...     },
+        ... ]
+
+        >>> response = await client.chat_completion(
+        ...     model="meta-llama/Meta-Llama-3-70B-Instruct",
+        ...     messages=messages,
+        ...     tools=tools,
+        ...     tool_choice="auto",
+        ...     max_tokens=500,
+        ... )
+        >>> response.choices[0].message.tool_calls[0].function
+        ChatCompletionOutputFunctionDefinition(
+            arguments={
+                'location': 'San Francisco, CA',
+                'format': 'fahrenheit',
+                'num_days': 3
+            },
+            name='get_n_day_weather_forecast',
+            description=None
+        )
+        ```
+
+        Example using response_format:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
+        >>> messages = [
+        ...     {
+        ...         "role": "user",
+        ...         "content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
+        ...     },
+        ... ]
+        >>> response_format = {
+        ...     "type": "json",
+        ...     "value": {
+        ...         "properties": {
+        ...             "location": {"type": "string"},
+        ...             "activity": {"type": "string"},
+        ...             "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
+        ...             "animals": {"type": "array", "items": {"type": "string"}},
+        ...         },
+        ...         "required": ["location", "activity", "animals_seen", "animals"],
+        ...     },
+        ... }
+        >>> response = await client.chat_completion(
+        ...     messages=messages,
+        ...     response_format=response_format,
+        ...     max_tokens=500,
+        ... )
+        >>> response.choices[0].message.content
+        '{\n\n"activity": "bike ride",\n"animals": ["puppy", "cat", "raccoon"],\n"animals_seen": 3,\n"location": "park"}'
+        ```
+        """
+        # Since `chat_completion(..., model=xxx)` is also a payload parameter for the server, we need to handle 'model' differently.
+        # `self.model` takes precedence over 'model' argument for building URL.
+        # `model` takes precedence for payload value.
+        model_id_or_url = self.model or model
+        payload_model = model or self.model
+
+        # Get the provider helper
+        provider_helper = get_provider_helper(
+            self.provider,
+            task="conversational",
+            model=model_id_or_url
+            if model_id_or_url is not None and model_id_or_url.startswith(("http://", "https://"))
+            else payload_model,
+        )
+
+        # Prepare the payload
+        parameters = {
+            "model": payload_model,
+            "frequency_penalty": frequency_penalty,
+            "logit_bias": logit_bias,
+            "logprobs": logprobs,
+            "max_tokens": max_tokens,
+            "n": n,
+            "presence_penalty": presence_penalty,
+            "response_format": response_format,
+            "seed": seed,
+            "stop": stop,
+            "temperature": temperature,
+            "tool_choice": tool_choice,
+            "tool_prompt": tool_prompt,
+            "tools": tools,
+            "top_logprobs": top_logprobs,
+            "top_p": top_p,
+            "stream": stream,
+            "stream_options": stream_options,
+            **(extra_body or {}),
+        }
+        request_parameters = provider_helper.prepare_request(
+            inputs=messages,
+            parameters=parameters,
+            headers=self.headers,
+            model=model_id_or_url,
+            api_key=self.token,
+        )
+        data = await self._inner_post(request_parameters, stream=stream)
+
+        if stream:
+            return _async_stream_chat_completion_response(data)  # type: ignore[arg-type]
+
+        return ChatCompletionOutput.parse_obj_as_instance(data)  # type: ignore[arg-type]
+
+    async def document_question_answering(
+        self,
+        image: ContentT,
+        question: str,
+        *,
+        model: Optional[str] = None,
+        doc_stride: Optional[int] = None,
+        handle_impossible_answer: Optional[bool] = None,
+        lang: Optional[str] = None,
+        max_answer_len: Optional[int] = None,
+        max_question_len: Optional[int] = None,
+        max_seq_len: Optional[int] = None,
+        top_k: Optional[int] = None,
+        word_boxes: Optional[List[Union[List[float], str]]] = None,
+    ) -> List[DocumentQuestionAnsweringOutputElement]:
+        """
+        Answer questions on document images.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO]`):
+                The input image for the context. It can be raw bytes, an image file, or a URL to an online image.
+            question (`str`):
+                Question to be answered.
+            model (`str`, *optional*):
+                The model to use for the document question answering task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended document question answering model will be used.
+                Defaults to None.
+            doc_stride (`int`, *optional*):
+                If the words in the document are too long to fit with the question for the model, it will be split in
+                several chunks with some overlap. This argument controls the size of that overlap.
+            handle_impossible_answer (`bool`, *optional*):
+                Whether to accept impossible as an answer
+            lang (`str`, *optional*):
+                Language to use while running OCR. Defaults to english.
+            max_answer_len (`int`, *optional*):
+                The maximum length of predicted answers (e.g., only answers with a shorter length are considered).
+            max_question_len (`int`, *optional*):
+                The maximum length of the question after tokenization. It will be truncated if needed.
+            max_seq_len (`int`, *optional*):
+                The maximum length of the total sentence (context + question) in tokens of each chunk passed to the
+                model. The context will be split in several chunks (using doc_stride as overlap) if needed.
+            top_k (`int`, *optional*):
+                The number of answers to return (will be chosen by order of likelihood). Can return less than top_k
+                answers if there are not enough options available within the context.
+            word_boxes (`List[Union[List[float], str`, *optional*):
+                A list of words and bounding boxes (normalized 0->1000). If provided, the inference will skip the OCR
+                step and use the provided bounding boxes instead.
+        Returns:
+            `List[DocumentQuestionAnsweringOutputElement]`: a list of [`DocumentQuestionAnsweringOutputElement`] items containing the predicted label, associated probability, word ids, and page number.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.document_question_answering(image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png", question="What is the invoice number?")
+        [DocumentQuestionAnsweringOutputElement(answer='us-001', end=16, score=0.9999666213989258, start=16)]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="document-question-answering", model=model_id)
+        inputs: Dict[str, Any] = {"question": question, "image": _b64_encode(image)}
+        request_parameters = provider_helper.prepare_request(
+            inputs=inputs,
+            parameters={
+                "doc_stride": doc_stride,
+                "handle_impossible_answer": handle_impossible_answer,
+                "lang": lang,
+                "max_answer_len": max_answer_len,
+                "max_question_len": max_question_len,
+                "max_seq_len": max_seq_len,
+                "top_k": top_k,
+                "word_boxes": word_boxes,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return DocumentQuestionAnsweringOutputElement.parse_obj_as_list(response)
+
+    async def feature_extraction(
+        self,
+        text: str,
+        *,
+        normalize: Optional[bool] = None,
+        prompt_name: Optional[str] = None,
+        truncate: Optional[bool] = None,
+        truncation_direction: Optional[Literal["Left", "Right"]] = None,
+        model: Optional[str] = None,
+    ) -> "np.ndarray":
+        """
+        Generate embeddings for a given text.
+
+        Args:
+            text (`str`):
+                The text to embed.
+            model (`str`, *optional*):
+                The model to use for the feature extraction task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended feature extraction model will be used.
+                Defaults to None.
+            normalize (`bool`, *optional*):
+                Whether to normalize the embeddings or not.
+                Only available on server powered by Text-Embedding-Inference.
+            prompt_name (`str`, *optional*):
+                The name of the prompt that should be used by for encoding. If not set, no prompt will be applied.
+                Must be a key in the `Sentence Transformers` configuration `prompts` dictionary.
+                For example if ``prompt_name`` is "query" and the ``prompts`` is {"query": "query: ",...},
+                then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?"
+                because the prompt text will be prepended before any text to encode.
+            truncate (`bool`, *optional*):
+                Whether to truncate the embeddings or not.
+                Only available on server powered by Text-Embedding-Inference.
+            truncation_direction (`Literal["Left", "Right"]`, *optional*):
+                Which side of the input should be truncated when `truncate=True` is passed.
+
+        Returns:
+            `np.ndarray`: The embedding representing the input text as a float32 numpy array.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.feature_extraction("Hi, who are you?")
+        array([[ 2.424802  ,  2.93384   ,  1.1750331 , ...,  1.240499, -0.13776633, -0.7889173 ],
+        [-0.42943227, -0.6364878 , -1.693462  , ...,  0.41978157, -2.4336355 ,  0.6162071 ],
+        ...,
+        [ 0.28552425, -0.928395  , -1.2077185 , ...,  0.76810825, -2.1069427 ,  0.6236161 ]], dtype=float32)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="feature-extraction", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "normalize": normalize,
+                "prompt_name": prompt_name,
+                "truncate": truncate,
+                "truncation_direction": truncation_direction,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        np = _import_numpy()
+        return np.array(provider_helper.get_response(response), dtype="float32")
+
+    async def fill_mask(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        targets: Optional[List[str]] = None,
+        top_k: Optional[int] = None,
+    ) -> List[FillMaskOutputElement]:
+        """
+        Fill in a hole with a missing word (token to be precise).
+
+        Args:
+            text (`str`):
+                a string to be filled from, must contain the [MASK] token (check model card for exact name of the mask).
+            model (`str`, *optional*):
+                The model to use for the fill mask task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended fill mask model will be used.
+            targets (`List[str`, *optional*):
+                When passed, the model will limit the scores to the passed targets instead of looking up in the whole
+                vocabulary. If the provided targets are not in the model vocab, they will be tokenized and the first
+                resulting token will be used (with a warning, and that might be slower).
+            top_k (`int`, *optional*):
+                When passed, overrides the number of predictions to return.
+        Returns:
+            `List[FillMaskOutputElement]`: a list of [`FillMaskOutputElement`] items containing the predicted label, associated
+            probability, token reference, and completed text.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.fill_mask("The goal of life is <mask>.")
+        [
+            FillMaskOutputElement(score=0.06897063553333282, token=11098, token_str=' happiness', sequence='The goal of life is happiness.'),
+            FillMaskOutputElement(score=0.06554922461509705, token=45075, token_str=' immortality', sequence='The goal of life is immortality.')
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="fill-mask", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={"targets": targets, "top_k": top_k},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return FillMaskOutputElement.parse_obj_as_list(response)
+
+    async def image_classification(
+        self,
+        image: ContentT,
+        *,
+        model: Optional[str] = None,
+        function_to_apply: Optional["ImageClassificationOutputTransform"] = None,
+        top_k: Optional[int] = None,
+    ) -> List[ImageClassificationOutputElement]:
+        """
+        Perform image classification on the given image using the specified model.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The image to classify. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            model (`str`, *optional*):
+                The model to use for image classification. Can be a model ID hosted on the Hugging Face Hub or a URL to a
+                deployed Inference Endpoint. If not provided, the default recommended model for image classification will be used.
+            function_to_apply (`"ImageClassificationOutputTransform"`, *optional*):
+                The function to apply to the model outputs in order to retrieve the scores.
+            top_k (`int`, *optional*):
+                When specified, limits the output to the top K most probable classes.
+        Returns:
+            `List[ImageClassificationOutputElement]`: a list of [`ImageClassificationOutputElement`] items containing the predicted label and associated probability.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
+        [ImageClassificationOutputElement(label='Blenheim spaniel', score=0.9779096841812134), ...]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="image-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={"function_to_apply": function_to_apply, "top_k": top_k},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return ImageClassificationOutputElement.parse_obj_as_list(response)
+
+    async def image_segmentation(
+        self,
+        image: ContentT,
+        *,
+        model: Optional[str] = None,
+        mask_threshold: Optional[float] = None,
+        overlap_mask_area_threshold: Optional[float] = None,
+        subtask: Optional["ImageSegmentationSubtask"] = None,
+        threshold: Optional[float] = None,
+    ) -> List[ImageSegmentationOutputElement]:
+        """
+        Perform image segmentation on the given image using the specified model.
+
+        > [!WARNING]
+        > You must have `PIL` installed if you want to work with images (`pip install Pillow`).
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The image to segment. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            model (`str`, *optional*):
+                The model to use for image segmentation. Can be a model ID hosted on the Hugging Face Hub or a URL to a
+                deployed Inference Endpoint. If not provided, the default recommended model for image segmentation will be used.
+            mask_threshold (`float`, *optional*):
+                Threshold to use when turning the predicted masks into binary values.
+            overlap_mask_area_threshold (`float`, *optional*):
+                Mask overlap threshold to eliminate small, disconnected segments.
+            subtask (`"ImageSegmentationSubtask"`, *optional*):
+                Segmentation task to be performed, depending on model capabilities.
+            threshold (`float`, *optional*):
+                Probability threshold to filter out predicted masks.
+        Returns:
+            `List[ImageSegmentationOutputElement]`: A list of [`ImageSegmentationOutputElement`] items containing the segmented masks and associated attributes.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.image_segmentation("cat.jpg")
+        [ImageSegmentationOutputElement(score=0.989008, label='LABEL_184', mask=<PIL.PngImagePlugin.PngImageFile image mode=L size=400x300 at 0x7FDD2B129CC0>), ...]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="image-segmentation", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={
+                "mask_threshold": mask_threshold,
+                "overlap_mask_area_threshold": overlap_mask_area_threshold,
+                "subtask": subtask,
+                "threshold": threshold,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        output = ImageSegmentationOutputElement.parse_obj_as_list(response)
+        for item in output:
+            item.mask = _b64_to_image(item.mask)  # type: ignore [assignment]
+        return output
+
+    async def image_to_image(
+        self,
+        image: ContentT,
+        prompt: Optional[str] = None,
+        *,
+        negative_prompt: Optional[str] = None,
+        num_inference_steps: Optional[int] = None,
+        guidance_scale: Optional[float] = None,
+        model: Optional[str] = None,
+        target_size: Optional[ImageToImageTargetSize] = None,
+        **kwargs,
+    ) -> "Image":
+        """
+        Perform image-to-image translation using a specified model.
+
+        > [!WARNING]
+        > You must have `PIL` installed if you want to work with images (`pip install Pillow`).
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The input image for translation. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            prompt (`str`, *optional*):
+                The text prompt to guide the image generation.
+            negative_prompt (`str`, *optional*):
+                One prompt to guide what NOT to include in image generation.
+            num_inference_steps (`int`, *optional*):
+                For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher
+                quality image at the expense of slower inference.
+            guidance_scale (`float`, *optional*):
+                For diffusion models. A higher guidance scale value encourages the model to generate images closely
+                linked to the text prompt at the expense of lower image quality.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+            target_size (`ImageToImageTargetSize`, *optional*):
+                The size in pixels of the output image. This parameter is only supported by some providers and for
+                specific models. It will be ignored when unsupported.
+
+        Returns:
+            `Image`: The translated image.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> image = await client.image_to_image("cat.jpg", prompt="turn the cat into a tiger")
+        >>> image.save("tiger.jpg")
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="image-to-image", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={
+                "prompt": prompt,
+                "negative_prompt": negative_prompt,
+                "target_size": target_size,
+                "num_inference_steps": num_inference_steps,
+                "guidance_scale": guidance_scale,
+                **kwargs,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        response = provider_helper.get_response(response, request_parameters)
+        return _bytes_to_image(response)
+
+    async def image_to_video(
+        self,
+        image: ContentT,
+        *,
+        model: Optional[str] = None,
+        prompt: Optional[str] = None,
+        negative_prompt: Optional[str] = None,
+        num_frames: Optional[float] = None,
+        num_inference_steps: Optional[int] = None,
+        guidance_scale: Optional[float] = None,
+        seed: Optional[int] = None,
+        target_size: Optional[ImageToVideoTargetSize] = None,
+        **kwargs,
+    ) -> bytes:
+        """
+        Generate a video from an input image.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The input image to generate a video from. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+            prompt (`str`, *optional*):
+                The text prompt to guide the video generation.
+            negative_prompt (`str`, *optional*):
+                One prompt to guide what NOT to include in video generation.
+            num_frames (`float`, *optional*):
+                The num_frames parameter determines how many video frames are generated.
+            num_inference_steps (`int`, *optional*):
+                For diffusion models. The number of denoising steps. More denoising steps usually lead to a higher
+                quality image at the expense of slower inference.
+            guidance_scale (`float`, *optional*):
+                For diffusion models. A higher guidance scale value encourages the model to generate videos closely
+                linked to the text prompt at the expense of lower image quality.
+            seed (`int`, *optional*):
+                The seed to use for the video generation.
+            target_size (`ImageToVideoTargetSize`, *optional*):
+                The size in pixel of the output video frames.
+            num_inference_steps (`int`, *optional*):
+                The number of denoising steps. More denoising steps usually lead to a higher quality video at the
+                expense of slower inference.
+            seed (`int`, *optional*):
+                Seed for the random number generator.
+
+        Returns:
+            `bytes`: The generated video.
+
+        Examples:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> video = await client.image_to_video("cat.jpg", model="Wan-AI/Wan2.2-I2V-A14B", prompt="turn the cat into a tiger")
+        >>> with open("tiger.mp4", "wb") as f:
+        ...     f.write(video)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="image-to-video", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={
+                "prompt": prompt,
+                "negative_prompt": negative_prompt,
+                "num_frames": num_frames,
+                "num_inference_steps": num_inference_steps,
+                "guidance_scale": guidance_scale,
+                "seed": seed,
+                "target_size": target_size,
+                **kwargs,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        response = provider_helper.get_response(response, request_parameters)
+        return response
+
+    async def image_to_text(self, image: ContentT, *, model: Optional[str] = None) -> ImageToTextOutput:
+        """
+        Takes an input image and return text.
+
+        Models can have very different outputs depending on your use case (image captioning, optical character recognition
+        (OCR), Pix2Struct, etc). Please have a look to the model card to learn more about a model's specificities.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The input image to caption. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+
+        Returns:
+            [`ImageToTextOutput`]: The generated text.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.image_to_text("cat.jpg")
+        'a cat standing in a grassy field '
+        >>> await client.image_to_text("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
+        'a dog laying on the grass next to a flower pot '
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="image-to-text", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        output_list: List[ImageToTextOutput] = ImageToTextOutput.parse_obj_as_list(response)
+        return output_list[0]
+
+    async def object_detection(
+        self, image: ContentT, *, model: Optional[str] = None, threshold: Optional[float] = None
+    ) -> List[ObjectDetectionOutputElement]:
+        """
+        Perform object detection on the given image using the specified model.
+
+        > [!WARNING]
+        > You must have `PIL` installed if you want to work with images (`pip install Pillow`).
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The image to detect objects on. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            model (`str`, *optional*):
+                The model to use for object detection. Can be a model ID hosted on the Hugging Face Hub or a URL to a
+                deployed Inference Endpoint. If not provided, the default recommended model for object detection (DETR) will be used.
+            threshold (`float`, *optional*):
+                The probability necessary to make a prediction.
+        Returns:
+            `List[ObjectDetectionOutputElement]`: A list of [`ObjectDetectionOutputElement`] items containing the bounding boxes and associated attributes.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+            `ValueError`:
+                If the request output is not a List.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.object_detection("people.jpg")
+        [ObjectDetectionOutputElement(score=0.9486683011054993, label='person', box=ObjectDetectionBoundingBox(xmin=59, ymin=39, xmax=420, ymax=510)), ...]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="object-detection", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={"threshold": threshold},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return ObjectDetectionOutputElement.parse_obj_as_list(response)
+
+    async def question_answering(
+        self,
+        question: str,
+        context: str,
+        *,
+        model: Optional[str] = None,
+        align_to_words: Optional[bool] = None,
+        doc_stride: Optional[int] = None,
+        handle_impossible_answer: Optional[bool] = None,
+        max_answer_len: Optional[int] = None,
+        max_question_len: Optional[int] = None,
+        max_seq_len: Optional[int] = None,
+        top_k: Optional[int] = None,
+    ) -> Union[QuestionAnsweringOutputElement, List[QuestionAnsweringOutputElement]]:
+        """
+        Retrieve the answer to a question from a given text.
+
+        Args:
+            question (`str`):
+                Question to be answered.
+            context (`str`):
+                The context of the question.
+            model (`str`):
+                The model to use for the question answering task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint.
+            align_to_words (`bool`, *optional*):
+                Attempts to align the answer to real words. Improves quality on space separated languages. Might hurt
+                on non-space-separated languages (like Japanese or Chinese)
+            doc_stride (`int`, *optional*):
+                If the context is too long to fit with the question for the model, it will be split in several chunks
+                with some overlap. This argument controls the size of that overlap.
+            handle_impossible_answer (`bool`, *optional*):
+                Whether to accept impossible as an answer.
+            max_answer_len (`int`, *optional*):
+                The maximum length of predicted answers (e.g., only answers with a shorter length are considered).
+            max_question_len (`int`, *optional*):
+                The maximum length of the question after tokenization. It will be truncated if needed.
+            max_seq_len (`int`, *optional*):
+                The maximum length of the total sentence (context + question) in tokens of each chunk passed to the
+                model. The context will be split in several chunks (using docStride as overlap) if needed.
+            top_k (`int`, *optional*):
+                The number of answers to return (will be chosen by order of likelihood). Note that we return less than
+                topk answers if there are not enough options available within the context.
+
+        Returns:
+            Union[`QuestionAnsweringOutputElement`, List[`QuestionAnsweringOutputElement`]]:
+                When top_k is 1 or not provided, it returns a single `QuestionAnsweringOutputElement`.
+                When top_k is greater than 1, it returns a list of `QuestionAnsweringOutputElement`.
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.question_answering(question="What's my name?", context="My name is Clara and I live in Berkeley.")
+        QuestionAnsweringOutputElement(answer='Clara', end=16, score=0.9326565265655518, start=11)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="question-answering", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs={"question": question, "context": context},
+            parameters={
+                "align_to_words": align_to_words,
+                "doc_stride": doc_stride,
+                "handle_impossible_answer": handle_impossible_answer,
+                "max_answer_len": max_answer_len,
+                "max_question_len": max_question_len,
+                "max_seq_len": max_seq_len,
+                "top_k": top_k,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        # Parse the response as a single `QuestionAnsweringOutputElement` when top_k is 1 or not provided, or a list of `QuestionAnsweringOutputElement` to ensure backward compatibility.
+        output = QuestionAnsweringOutputElement.parse_obj(response)
+        return output
+
+    async def sentence_similarity(
+        self, sentence: str, other_sentences: List[str], *, model: Optional[str] = None
+    ) -> List[float]:
+        """
+        Compute the semantic similarity between a sentence and a list of other sentences by comparing their embeddings.
+
+        Args:
+            sentence (`str`):
+                The main sentence to compare to others.
+            other_sentences (`List[str]`):
+                The list of sentences to compare to.
+            model (`str`, *optional*):
+                The model to use for the sentence similarity task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended sentence similarity model will be used.
+                Defaults to None.
+
+        Returns:
+            `List[float]`: The similarity scores between the main sentence and the given comparison sentences.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.sentence_similarity(
+        ...     "Machine learning is so easy.",
+        ...     other_sentences=[
+        ...         "Deep learning is so straightforward.",
+        ...         "This is so difficult, like rocket science.",
+        ...         "I can't believe how much I struggled with this.",
+        ...     ],
+        ... )
+        [0.7785726189613342, 0.45876261591911316, 0.2906220555305481]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="sentence-similarity", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs={"source_sentence": sentence, "sentences": other_sentences},
+            parameters={},
+            extra_payload={},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return _bytes_to_list(response)
+
+    async def summarization(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        clean_up_tokenization_spaces: Optional[bool] = None,
+        generate_parameters: Optional[Dict[str, Any]] = None,
+        truncation: Optional["SummarizationTruncationStrategy"] = None,
+    ) -> SummarizationOutput:
+        """
+        Generate a summary of a given text using a specified model.
+
+        Args:
+            text (`str`):
+                The input text to summarize.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended model for summarization will be used.
+            clean_up_tokenization_spaces (`bool`, *optional*):
+                Whether to clean up the potential extra spaces in the text output.
+            generate_parameters (`Dict[str, Any]`, *optional*):
+                Additional parametrization of the text generation algorithm.
+            truncation (`"SummarizationTruncationStrategy"`, *optional*):
+                The truncation strategy to use.
+        Returns:
+            [`SummarizationOutput`]: The generated summary text.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.summarization("The Eiffel tower...")
+        SummarizationOutput(generated_text="The Eiffel tower is one of the most famous landmarks in the world....")
+        ```
+        """
+        parameters = {
+            "clean_up_tokenization_spaces": clean_up_tokenization_spaces,
+            "generate_parameters": generate_parameters,
+            "truncation": truncation,
+        }
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="summarization", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters=parameters,
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return SummarizationOutput.parse_obj_as_list(response)[0]
+
+    async def table_question_answering(
+        self,
+        table: Dict[str, Any],
+        query: str,
+        *,
+        model: Optional[str] = None,
+        padding: Optional["Padding"] = None,
+        sequential: Optional[bool] = None,
+        truncation: Optional[bool] = None,
+    ) -> TableQuestionAnsweringOutputElement:
+        """
+        Retrieve the answer to a question from information given in a table.
+
+        Args:
+            table (`str`):
+                A table of data represented as a dict of lists where entries are headers and the lists are all the
+                values, all lists must have the same size.
+            query (`str`):
+                The query in plain text that you want to ask the table.
+            model (`str`):
+                The model to use for the table-question-answering task. Can be a model ID hosted on the Hugging Face
+                Hub or a URL to a deployed Inference Endpoint.
+            padding (`"Padding"`, *optional*):
+                Activates and controls padding.
+            sequential (`bool`, *optional*):
+                Whether to do inference sequentially or as a batch. Batching is faster, but models like SQA require the
+                inference to be done sequentially to extract relations within sequences, given their conversational
+                nature.
+            truncation (`bool`, *optional*):
+                Activates and controls truncation.
+
+        Returns:
+            [`TableQuestionAnsweringOutputElement`]: a table question answering output containing the answer, coordinates, cells and the aggregator used.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> query = "How many stars does the transformers repository have?"
+        >>> table = {"Repository": ["Transformers", "Datasets", "Tokenizers"], "Stars": ["36542", "4512", "3934"]}
+        >>> await client.table_question_answering(table, query, model="google/tapas-base-finetuned-wtq")
+        TableQuestionAnsweringOutputElement(answer='36542', coordinates=[[0, 1]], cells=['36542'], aggregator='AVERAGE')
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="table-question-answering", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs={"query": query, "table": table},
+            parameters={"model": model, "padding": padding, "sequential": sequential, "truncation": truncation},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return TableQuestionAnsweringOutputElement.parse_obj_as_instance(response)
+
+    async def tabular_classification(self, table: Dict[str, Any], *, model: Optional[str] = None) -> List[str]:
+        """
+        Classifying a target category (a group) based on a set of attributes.
+
+        Args:
+            table (`Dict[str, Any]`):
+                Set of attributes to classify.
+            model (`str`, *optional*):
+                The model to use for the tabular classification task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended tabular classification model will be used.
+                Defaults to None.
+
+        Returns:
+            `List`: a list of labels, one per row in the initial table.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> table = {
+        ...     "fixed_acidity": ["7.4", "7.8", "10.3"],
+        ...     "volatile_acidity": ["0.7", "0.88", "0.32"],
+        ...     "citric_acid": ["0", "0", "0.45"],
+        ...     "residual_sugar": ["1.9", "2.6", "6.4"],
+        ...     "chlorides": ["0.076", "0.098", "0.073"],
+        ...     "free_sulfur_dioxide": ["11", "25", "5"],
+        ...     "total_sulfur_dioxide": ["34", "67", "13"],
+        ...     "density": ["0.9978", "0.9968", "0.9976"],
+        ...     "pH": ["3.51", "3.2", "3.23"],
+        ...     "sulphates": ["0.56", "0.68", "0.82"],
+        ...     "alcohol": ["9.4", "9.8", "12.6"],
+        ... }
+        >>> await client.tabular_classification(table=table, model="julien-c/wine-quality")
+        ["5", "5", "5"]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="tabular-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=None,
+            extra_payload={"table": table},
+            parameters={},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return _bytes_to_list(response)
+
+    async def tabular_regression(self, table: Dict[str, Any], *, model: Optional[str] = None) -> List[float]:
+        """
+        Predicting a numerical target value given a set of attributes/features in a table.
+
+        Args:
+            table (`Dict[str, Any]`):
+                Set of attributes stored in a table. The attributes used to predict the target can be both numerical and categorical.
+            model (`str`, *optional*):
+                The model to use for the tabular regression task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended tabular regression model will be used.
+                Defaults to None.
+
+        Returns:
+            `List`: a list of predicted numerical target values.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> table = {
+        ...     "Height": ["11.52", "12.48", "12.3778"],
+        ...     "Length1": ["23.2", "24", "23.9"],
+        ...     "Length2": ["25.4", "26.3", "26.5"],
+        ...     "Length3": ["30", "31.2", "31.1"],
+        ...     "Species": ["Bream", "Bream", "Bream"],
+        ...     "Width": ["4.02", "4.3056", "4.6961"],
+        ... }
+        >>> await client.tabular_regression(table, model="scikit-learn/Fish-Weight")
+        [110, 120, 130]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="tabular-regression", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=None,
+            parameters={},
+            extra_payload={"table": table},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return _bytes_to_list(response)
+
+    async def text_classification(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        top_k: Optional[int] = None,
+        function_to_apply: Optional["TextClassificationOutputTransform"] = None,
+    ) -> List[TextClassificationOutputElement]:
+        """
+        Perform text classification (e.g. sentiment-analysis) on the given text.
+
+        Args:
+            text (`str`):
+                A string to be classified.
+            model (`str`, *optional*):
+                The model to use for the text classification task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended text classification model will be used.
+                Defaults to None.
+            top_k (`int`, *optional*):
+                When specified, limits the output to the top K most probable classes.
+            function_to_apply (`"TextClassificationOutputTransform"`, *optional*):
+                The function to apply to the model outputs in order to retrieve the scores.
+
+        Returns:
+            `List[TextClassificationOutputElement]`: a list of [`TextClassificationOutputElement`] items containing the predicted label and associated probability.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.text_classification("I like you")
+        [
+            TextClassificationOutputElement(label='POSITIVE', score=0.9998695850372314),
+            TextClassificationOutputElement(label='NEGATIVE', score=0.0001304351753788069),
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="text-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "function_to_apply": function_to_apply,
+                "top_k": top_k,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return TextClassificationOutputElement.parse_obj_as_list(response)[0]  # type: ignore [return-value]
+
+    @overload
+    async def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Literal[True],
+        stream: Literal[True],
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> AsyncIterable[TextGenerationStreamOutput]: ...
+
+    @overload
+    async def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Literal[True],
+        stream: Optional[Literal[False]] = None,
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> TextGenerationOutput: ...
+
+    @overload
+    async def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Optional[Literal[False]] = None,
+        stream: Literal[True],
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,  # Manual default value
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> AsyncIterable[str]: ...
+
+    @overload
+    async def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Optional[Literal[False]] = None,
+        stream: Optional[Literal[False]] = None,
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> str: ...
+
+    @overload
+    async def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Optional[bool] = None,
+        stream: Optional[bool] = None,
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> Union[str, TextGenerationOutput, AsyncIterable[str], AsyncIterable[TextGenerationStreamOutput]]: ...
+
+    async def text_generation(
+        self,
+        prompt: str,
+        *,
+        details: Optional[bool] = None,
+        stream: Optional[bool] = None,
+        model: Optional[str] = None,
+        # Parameters from `TextGenerationInputGenerateParameters` (maintained manually)
+        adapter_id: Optional[str] = None,
+        best_of: Optional[int] = None,
+        decoder_input_details: Optional[bool] = None,
+        do_sample: Optional[bool] = None,
+        frequency_penalty: Optional[float] = None,
+        grammar: Optional[TextGenerationInputGrammarType] = None,
+        max_new_tokens: Optional[int] = None,
+        repetition_penalty: Optional[float] = None,
+        return_full_text: Optional[bool] = None,
+        seed: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+        stop_sequences: Optional[List[str]] = None,  # Deprecated, use `stop` instead
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_n_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        truncate: Optional[int] = None,
+        typical_p: Optional[float] = None,
+        watermark: Optional[bool] = None,
+    ) -> Union[str, TextGenerationOutput, AsyncIterable[str], AsyncIterable[TextGenerationStreamOutput]]:
+        """
+        Given a prompt, generate the following text.
+
+        > [!TIP]
+        > If you want to generate a response from chat messages, you should use the [`InferenceClient.chat_completion`] method.
+        > It accepts a list of messages instead of a single text prompt and handles the chat templating for you.
+
+        Args:
+            prompt (`str`):
+                Input text.
+            details (`bool`, *optional*):
+                By default, text_generation returns a string. Pass `details=True` if you want a detailed output (tokens,
+                probabilities, seed, finish reason, etc.). Only available for models running on with the
+                `text-generation-inference` backend.
+            stream (`bool`, *optional*):
+                By default, text_generation returns the full generated text. Pass `stream=True` if you want a stream of
+                tokens to be returned. Only available for models running on with the `text-generation-inference`
+                backend.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+            adapter_id (`str`, *optional*):
+                Lora adapter id.
+            best_of (`int`, *optional*):
+                Generate best_of sequences and return the one if the highest token logprobs.
+            decoder_input_details (`bool`, *optional*):
+                Return the decoder input token logprobs and ids. You must set `details=True` as well for it to be taken
+                into account. Defaults to `False`.
+            do_sample (`bool`, *optional*):
+                Activate logits sampling
+            frequency_penalty (`float`, *optional*):
+                Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in
+                the text so far, decreasing the model's likelihood to repeat the same line verbatim.
+            grammar ([`TextGenerationInputGrammarType`], *optional*):
+                Grammar constraints. Can be either a JSONSchema or a regex.
+            max_new_tokens (`int`, *optional*):
+                Maximum number of generated tokens. Defaults to 100.
+            repetition_penalty (`float`, *optional*):
+                The parameter for repetition penalty. 1.0 means no penalty. See [this
+                paper](https://arxiv.org/pdf/1909.05858.pdf) for more details.
+            return_full_text (`bool`, *optional*):
+                Whether to prepend the prompt to the generated text
+            seed (`int`, *optional*):
+                Random sampling seed
+            stop (`List[str]`, *optional*):
+                Stop generating tokens if a member of `stop` is generated.
+            stop_sequences (`List[str]`, *optional*):
+                Deprecated argument. Use `stop` instead.
+            temperature (`float`, *optional*):
+                The value used to module the logits distribution.
+            top_n_tokens (`int`, *optional*):
+                Return information about the `top_n_tokens` most likely tokens at each generation step, instead of
+                just the sampled token.
+            top_k (`int`, *optional`):
+                The number of highest probability vocabulary tokens to keep for top-k-filtering.
+            top_p (`float`, *optional`):
+                If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
+                higher are kept for generation.
+            truncate (`int`, *optional`):
+                Truncate inputs tokens to the given size.
+            typical_p (`float`, *optional`):
+                Typical Decoding mass
+                See [Typical Decoding for Natural Language Generation](https://arxiv.org/abs/2202.00666) for more information
+            watermark (`bool`, *optional*):
+                Watermarking with [A Watermark for Large Language Models](https://arxiv.org/abs/2301.10226)
+
+        Returns:
+            `Union[str, TextGenerationOutput, Iterable[str], Iterable[TextGenerationStreamOutput]]`:
+            Generated text returned from the server:
+            - if `stream=False` and `details=False`, the generated text is returned as a `str` (default)
+            - if `stream=True` and `details=False`, the generated text is returned token by token as a `Iterable[str]`
+            - if `stream=False` and `details=True`, the generated text is returned with more details as a [`~huggingface_hub.TextGenerationOutput`]
+            - if `details=True` and `stream=True`, the generated text is returned token by token as a iterable of [`~huggingface_hub.TextGenerationStreamOutput`]
+
+        Raises:
+            `ValidationError`:
+                If input values are not valid. No HTTP call is made to the server.
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+
+        # Case 1: generate text
+        >>> await client.text_generation("The huggingface_hub library is ", max_new_tokens=12)
+        '100% open source and built to be easy to use.'
+
+        # Case 2: iterate over the generated tokens. Useful for large generation.
+        >>> async for token in await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, stream=True):
+        ...     print(token)
+        100
+        %
+        open
+        source
+        and
+        built
+        to
+        be
+        easy
+        to
+        use
+        .
+
+        # Case 3: get more details about the generation process.
+        >>> await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
+        TextGenerationOutput(
+            generated_text='100% open source and built to be easy to use.',
+            details=TextGenerationDetails(
+                finish_reason='length',
+                generated_tokens=12,
+                seed=None,
+                prefill=[
+                    TextGenerationPrefillOutputToken(id=487, text='The', logprob=None),
+                    TextGenerationPrefillOutputToken(id=53789, text=' hugging', logprob=-13.171875),
+                    (...)
+                    TextGenerationPrefillOutputToken(id=204, text=' ', logprob=-7.0390625)
+                ],
+                tokens=[
+                    TokenElement(id=1425, text='100', logprob=-1.0175781, special=False),
+                    TokenElement(id=16, text='%', logprob=-0.0463562, special=False),
+                    (...)
+                    TokenElement(id=25, text='.', logprob=-0.5703125, special=False)
+                ],
+                best_of_sequences=None
+            )
+        )
+
+        # Case 4: iterate over the generated tokens with more details.
+        # Last object is more complete, containing the full generated text and the finish reason.
+        >>> async for details in await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True, stream=True):
+        ...     print(details)
+        ...
+        TextGenerationStreamOutput(token=TokenElement(id=1425, text='100', logprob=-1.0175781, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=16, text='%', logprob=-0.0463562, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=1314, text=' open', logprob=-1.3359375, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=3178, text=' source', logprob=-0.28100586, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=273, text=' and', logprob=-0.5961914, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=3426, text=' built', logprob=-1.9423828, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-1.4121094, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=314, text=' be', logprob=-1.5224609, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=1833, text=' easy', logprob=-2.1132812, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-0.08520508, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(id=745, text=' use', logprob=-0.39453125, special=False), generated_text=None, details=None)
+        TextGenerationStreamOutput(token=TokenElement(
+            id=25,
+            text='.',
+            logprob=-0.5703125,
+            special=False),
+            generated_text='100% open source and built to be easy to use.',
+            details=TextGenerationStreamOutputStreamDetails(finish_reason='length', generated_tokens=12, seed=None)
+        )
+
+        # Case 5: generate constrained output using grammar
+        >>> response = await client.text_generation(
+        ...     prompt="I saw a puppy a cat and a raccoon during my bike ride in the park",
+        ...     model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
+        ...     max_new_tokens=100,
+        ...     repetition_penalty=1.3,
+        ...     grammar={
+        ...         "type": "json",
+        ...         "value": {
+        ...             "properties": {
+        ...                 "location": {"type": "string"},
+        ...                 "activity": {"type": "string"},
+        ...                 "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
+        ...                 "animals": {"type": "array", "items": {"type": "string"}},
+        ...             },
+        ...             "required": ["location", "activity", "animals_seen", "animals"],
+        ...         },
+        ...     },
+        ... )
+        >>> json.loads(response)
+        {
+            "activity": "bike riding",
+            "animals": ["puppy", "cat", "raccoon"],
+            "animals_seen": 3,
+            "location": "park"
+        }
+        ```
+        """
+        if decoder_input_details and not details:
+            warnings.warn(
+                "`decoder_input_details=True` has been passed to the server but `details=False` is set meaning that"
+                " the output from the server will be truncated."
+            )
+            decoder_input_details = False
+
+        if stop_sequences is not None:
+            warnings.warn(
+                "`stop_sequences` is a deprecated argument for `text_generation` task"
+                " and will be removed in version '0.28.0'. Use `stop` instead.",
+                FutureWarning,
+            )
+        if stop is None:
+            stop = stop_sequences  # use deprecated arg if provided
+
+        # Build payload
+        parameters = {
+            "adapter_id": adapter_id,
+            "best_of": best_of,
+            "decoder_input_details": decoder_input_details,
+            "details": details,
+            "do_sample": do_sample,
+            "frequency_penalty": frequency_penalty,
+            "grammar": grammar,
+            "max_new_tokens": max_new_tokens,
+            "repetition_penalty": repetition_penalty,
+            "return_full_text": return_full_text,
+            "seed": seed,
+            "stop": stop,
+            "temperature": temperature,
+            "top_k": top_k,
+            "top_n_tokens": top_n_tokens,
+            "top_p": top_p,
+            "truncate": truncate,
+            "typical_p": typical_p,
+            "watermark": watermark,
+        }
+
+        # Remove some parameters if not a TGI server
+        unsupported_kwargs = _get_unsupported_text_generation_kwargs(model)
+        if len(unsupported_kwargs) > 0:
+            # The server does not support some parameters
+            # => means it is not a TGI server
+            # => remove unsupported parameters and warn the user
+
+            ignored_parameters = []
+            for key in unsupported_kwargs:
+                if parameters.get(key):
+                    ignored_parameters.append(key)
+                parameters.pop(key, None)
+            if len(ignored_parameters) > 0:
+                warnings.warn(
+                    "API endpoint/model for text-generation is not served via TGI. Ignoring following parameters:"
+                    f" {', '.join(ignored_parameters)}.",
+                    UserWarning,
+                )
+            if details:
+                warnings.warn(
+                    "API endpoint/model for text-generation is not served via TGI. Parameter `details=True` will"
+                    " be ignored meaning only the generated text will be returned.",
+                    UserWarning,
+                )
+                details = False
+            if stream:
+                raise ValueError(
+                    "API endpoint/model for text-generation is not served via TGI. Cannot return output as a stream."
+                    " Please pass `stream=False` as input."
+                )
+
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="text-generation", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=prompt,
+            parameters=parameters,
+            extra_payload={"stream": stream},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+
+        # Handle errors separately for more precise error messages
+        try:
+            bytes_output = await self._inner_post(request_parameters, stream=stream or False)
+        except _import_aiohttp().ClientResponseError as e:
+            match = MODEL_KWARGS_NOT_USED_REGEX.search(e.response_error_payload["error"])
+            if e.status == 400 and match:
+                unused_params = [kwarg.strip("' ") for kwarg in match.group(1).split(",")]
+                _set_unsupported_text_generation_kwargs(model, unused_params)
+                return await self.text_generation(  # type: ignore
+                    prompt=prompt,
+                    details=details,
+                    stream=stream,
+                    model=model_id,
+                    adapter_id=adapter_id,
+                    best_of=best_of,
+                    decoder_input_details=decoder_input_details,
+                    do_sample=do_sample,
+                    frequency_penalty=frequency_penalty,
+                    grammar=grammar,
+                    max_new_tokens=max_new_tokens,
+                    repetition_penalty=repetition_penalty,
+                    return_full_text=return_full_text,
+                    seed=seed,
+                    stop=stop,
+                    temperature=temperature,
+                    top_k=top_k,
+                    top_n_tokens=top_n_tokens,
+                    top_p=top_p,
+                    truncate=truncate,
+                    typical_p=typical_p,
+                    watermark=watermark,
+                )
+            raise_text_generation_error(e)
+
+        # Parse output
+        if stream:
+            return _async_stream_text_generation_response(bytes_output, details)  # type: ignore
+
+        data = _bytes_to_dict(bytes_output)  # type: ignore[arg-type]
+
+        # Data can be a single element (dict) or an iterable of dicts where we select the first element of.
+        if isinstance(data, list):
+            data = data[0]
+        response = provider_helper.get_response(data, request_parameters)
+        return TextGenerationOutput.parse_obj_as_instance(response) if details else response["generated_text"]
+
+    async def text_to_image(
+        self,
+        prompt: str,
+        *,
+        negative_prompt: Optional[str] = None,
+        height: Optional[int] = None,
+        width: Optional[int] = None,
+        num_inference_steps: Optional[int] = None,
+        guidance_scale: Optional[float] = None,
+        model: Optional[str] = None,
+        scheduler: Optional[str] = None,
+        seed: Optional[int] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+    ) -> "Image":
+        """
+        Generate an image based on a given text using a specified model.
+
+        > [!WARNING]
+        > You must have `PIL` installed if you want to work with images (`pip install Pillow`).
+
+        > [!TIP]
+        > You can pass provider-specific parameters to the model by using the `extra_body` argument.
+
+        Args:
+            prompt (`str`):
+                The prompt to generate an image from.
+            negative_prompt (`str`, *optional*):
+                One prompt to guide what NOT to include in image generation.
+            height (`int`, *optional*):
+                The height in pixels of the output image
+            width (`int`, *optional*):
+                The width in pixels of the output image
+            num_inference_steps (`int`, *optional*):
+                The number of denoising steps. More denoising steps usually lead to a higher quality image at the
+                expense of slower inference.
+            guidance_scale (`float`, *optional*):
+                A higher guidance scale value encourages the model to generate images closely linked to the text
+                prompt, but values too high may cause saturation and other artifacts.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended text-to-image model will be used.
+                Defaults to None.
+            scheduler (`str`, *optional*):
+                Override the scheduler with a compatible one.
+            seed (`int`, *optional*):
+                Seed for the random number generator.
+            extra_body (`Dict[str, Any]`, *optional*):
+                Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
+                for supported parameters.
+
+        Returns:
+            `Image`: The generated image.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+
+        >>> image = await client.text_to_image("An astronaut riding a horse on the moon.")
+        >>> image.save("astronaut.png")
+
+        >>> image = await client.text_to_image(
+        ...     "An astronaut riding a horse on the moon.",
+        ...     negative_prompt="low resolution, blurry",
+        ...     model="stabilityai/stable-diffusion-2-1",
+        ... )
+        >>> image.save("better_astronaut.png")
+        ```
+        Example using a third-party provider directly. Usage will be billed on your fal.ai account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="fal-ai",  # Use fal.ai provider
+        ...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
+        ... )
+        >>> image = client.text_to_image(
+        ...     "A majestic lion in a fantasy forest",
+        ...     model="black-forest-labs/FLUX.1-schnell",
+        ... )
+        >>> image.save("lion.png")
+        ```
+
+        Example using a third-party provider through Hugging Face Routing. Usage will be billed on your Hugging Face account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",  # Use replicate provider
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> image = client.text_to_image(
+        ...     "An astronaut riding a horse on the moon.",
+        ...     model="black-forest-labs/FLUX.1-dev",
+        ... )
+        >>> image.save("astronaut.png")
+        ```
+
+        Example using Replicate provider with extra parameters
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",  # Use replicate provider
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> image = client.text_to_image(
+        ...     "An astronaut riding a horse on the moon.",
+        ...     model="black-forest-labs/FLUX.1-schnell",
+        ...     extra_body={"output_quality": 100},
+        ... )
+        >>> image.save("astronaut.png")
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="text-to-image", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=prompt,
+            parameters={
+                "negative_prompt": negative_prompt,
+                "height": height,
+                "width": width,
+                "num_inference_steps": num_inference_steps,
+                "guidance_scale": guidance_scale,
+                "scheduler": scheduler,
+                "seed": seed,
+                **(extra_body or {}),
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        response = provider_helper.get_response(response)
+        return _bytes_to_image(response)
+
+    async def text_to_video(
+        self,
+        prompt: str,
+        *,
+        model: Optional[str] = None,
+        guidance_scale: Optional[float] = None,
+        negative_prompt: Optional[List[str]] = None,
+        num_frames: Optional[float] = None,
+        num_inference_steps: Optional[int] = None,
+        seed: Optional[int] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+    ) -> bytes:
+        """
+        Generate a video based on a given text.
+
+        > [!TIP]
+        > You can pass provider-specific parameters to the model by using the `extra_body` argument.
+
+        Args:
+            prompt (`str`):
+                The prompt to generate a video from.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended text-to-video model will be used.
+                Defaults to None.
+            guidance_scale (`float`, *optional*):
+                A higher guidance scale value encourages the model to generate videos closely linked to the text
+                prompt, but values too high may cause saturation and other artifacts.
+            negative_prompt (`List[str]`, *optional*):
+                One or several prompt to guide what NOT to include in video generation.
+            num_frames (`float`, *optional*):
+                The num_frames parameter determines how many video frames are generated.
+            num_inference_steps (`int`, *optional*):
+                The number of denoising steps. More denoising steps usually lead to a higher quality video at the
+                expense of slower inference.
+            seed (`int`, *optional*):
+                Seed for the random number generator.
+            extra_body (`Dict[str, Any]`, *optional*):
+                Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
+                for supported parameters.
+
+        Returns:
+            `bytes`: The generated video.
+
+        Example:
+
+        Example using a third-party provider directly. Usage will be billed on your fal.ai account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="fal-ai",  # Using fal.ai provider
+        ...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
+        ... )
+        >>> video = client.text_to_video(
+        ...     "A majestic lion running in a fantasy forest",
+        ...     model="tencent/HunyuanVideo",
+        ... )
+        >>> with open("lion.mp4", "wb") as file:
+        ...     file.write(video)
+        ```
+
+        Example using a third-party provider through Hugging Face Routing. Usage will be billed on your Hugging Face account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",  # Using replicate provider
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> video = client.text_to_video(
+        ...     "A cat running in a park",
+        ...     model="genmo/mochi-1-preview",
+        ... )
+        >>> with open("cat.mp4", "wb") as file:
+        ...     file.write(video)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="text-to-video", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=prompt,
+            parameters={
+                "guidance_scale": guidance_scale,
+                "negative_prompt": negative_prompt,
+                "num_frames": num_frames,
+                "num_inference_steps": num_inference_steps,
+                "seed": seed,
+                **(extra_body or {}),
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        response = provider_helper.get_response(response, request_parameters)
+        return response
+
+    async def text_to_speech(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        do_sample: Optional[bool] = None,
+        early_stopping: Optional[Union[bool, "TextToSpeechEarlyStoppingEnum"]] = None,
+        epsilon_cutoff: Optional[float] = None,
+        eta_cutoff: Optional[float] = None,
+        max_length: Optional[int] = None,
+        max_new_tokens: Optional[int] = None,
+        min_length: Optional[int] = None,
+        min_new_tokens: Optional[int] = None,
+        num_beam_groups: Optional[int] = None,
+        num_beams: Optional[int] = None,
+        penalty_alpha: Optional[float] = None,
+        temperature: Optional[float] = None,
+        top_k: Optional[int] = None,
+        top_p: Optional[float] = None,
+        typical_p: Optional[float] = None,
+        use_cache: Optional[bool] = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+    ) -> bytes:
+        """
+        Synthesize an audio of a voice pronouncing a given text.
+
+        > [!TIP]
+        > You can pass provider-specific parameters to the model by using the `extra_body` argument.
+
+        Args:
+            text (`str`):
+                The text to synthesize.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. If not provided, the default recommended text-to-speech model will be used.
+                Defaults to None.
+            do_sample (`bool`, *optional*):
+                Whether to use sampling instead of greedy decoding when generating new tokens.
+            early_stopping (`Union[bool, "TextToSpeechEarlyStoppingEnum"]`, *optional*):
+                Controls the stopping condition for beam-based methods.
+            epsilon_cutoff (`float`, *optional*):
+                If set to float strictly between 0 and 1, only tokens with a conditional probability greater than
+                epsilon_cutoff will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on
+                the size of the model. See [Truncation Sampling as Language Model
+                Desmoothing](https://hf.co/papers/2210.15191) for more details.
+            eta_cutoff (`float`, *optional*):
+                Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly
+                between 0 and 1, a token is only considered if it is greater than either eta_cutoff or sqrt(eta_cutoff)
+                * exp(-entropy(softmax(next_token_logits))). The latter term is intuitively the expected next token
+                probability, scaled by sqrt(eta_cutoff). In the paper, suggested values range from 3e-4 to 2e-3,
+                depending on the size of the model. See [Truncation Sampling as Language Model
+                Desmoothing](https://hf.co/papers/2210.15191) for more details.
+            max_length (`int`, *optional*):
+                The maximum length (in tokens) of the generated text, including the input.
+            max_new_tokens (`int`, *optional*):
+                The maximum number of tokens to generate. Takes precedence over max_length.
+            min_length (`int`, *optional*):
+                The minimum length (in tokens) of the generated text, including the input.
+            min_new_tokens (`int`, *optional*):
+                The minimum number of tokens to generate. Takes precedence over min_length.
+            num_beam_groups (`int`, *optional*):
+                Number of groups to divide num_beams into in order to ensure diversity among different groups of beams.
+                See [this paper](https://hf.co/papers/1610.02424) for more details.
+            num_beams (`int`, *optional*):
+                Number of beams to use for beam search.
+            penalty_alpha (`float`, *optional*):
+                The value balances the model confidence and the degeneration penalty in contrastive search decoding.
+            temperature (`float`, *optional*):
+                The value used to modulate the next token probabilities.
+            top_k (`int`, *optional*):
+                The number of highest probability vocabulary tokens to keep for top-k-filtering.
+            top_p (`float`, *optional*):
+                If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to
+                top_p or higher are kept for generation.
+            typical_p (`float`, *optional*):
+                Local typicality measures how similar the conditional probability of predicting a target token next is
+                to the expected conditional probability of predicting a random token next, given the partial text
+                already generated. If set to float < 1, the smallest set of the most locally typical tokens with
+                probabilities that add up to typical_p or higher are kept for generation. See [this
+                paper](https://hf.co/papers/2202.00666) for more details.
+            use_cache (`bool`, *optional*):
+                Whether the model should use the past last key/values attentions to speed up decoding
+            extra_body (`Dict[str, Any]`, *optional*):
+                Additional provider-specific parameters to pass to the model. Refer to the provider's documentation
+                for supported parameters.
+        Returns:
+            `bytes`: The generated audio.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from pathlib import Path
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+
+        >>> audio = await client.text_to_speech("Hello world")
+        >>> Path("hello_world.flac").write_bytes(audio)
+        ```
+
+        Example using a third-party provider directly. Usage will be billed on your Replicate account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",
+        ...     api_key="your-replicate-api-key",  # Pass your Replicate API key directly
+        ... )
+        >>> audio = client.text_to_speech(
+        ...     text="Hello world",
+        ...     model="OuteAI/OuteTTS-0.3-500M",
+        ... )
+        >>> Path("hello_world.flac").write_bytes(audio)
+        ```
+
+        Example using a third-party provider through Hugging Face Routing. Usage will be billed on your Hugging Face account.
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> audio =client.text_to_speech(
+        ...     text="Hello world",
+        ...     model="OuteAI/OuteTTS-0.3-500M",
+        ... )
+        >>> Path("hello_world.flac").write_bytes(audio)
+        ```
+        Example using Replicate provider with extra parameters
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> client = InferenceClient(
+        ...     provider="replicate",  # Use replicate provider
+        ...     api_key="hf_...",  # Pass your HF token
+        ... )
+        >>> audio = client.text_to_speech(
+        ...     "Hello, my name is Kororo, an awesome text-to-speech model.",
+        ...     model="hexgrad/Kokoro-82M",
+        ...     extra_body={"voice": "af_nicole"},
+        ... )
+        >>> Path("hello.flac").write_bytes(audio)
+        ```
+
+        Example music-gen using "YuE-s1-7B-anneal-en-cot" on fal.ai
+        ```py
+        >>> from huggingface_hub import InferenceClient
+        >>> lyrics = '''
+        ... [verse]
+        ... In the town where I was born
+        ... Lived a man who sailed to sea
+        ... And he told us of his life
+        ... In the land of submarines
+        ... So we sailed on to the sun
+        ... 'Til we found a sea of green
+        ... And we lived beneath the waves
+        ... In our yellow submarine
+
+        ... [chorus]
+        ... We all live in a yellow submarine
+        ... Yellow submarine, yellow submarine
+        ... We all live in a yellow submarine
+        ... Yellow submarine, yellow submarine
+        ... '''
+        >>> genres = "pavarotti-style tenor voice"
+        >>> client = InferenceClient(
+        ...     provider="fal-ai",
+        ...     model="m-a-p/YuE-s1-7B-anneal-en-cot",
+        ...     api_key=...,
+        ... )
+        >>> audio = client.text_to_speech(lyrics, extra_body={"genres": genres})
+        >>> with open("output.mp3", "wb") as f:
+        ...     f.write(audio)
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="text-to-speech", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "do_sample": do_sample,
+                "early_stopping": early_stopping,
+                "epsilon_cutoff": epsilon_cutoff,
+                "eta_cutoff": eta_cutoff,
+                "max_length": max_length,
+                "max_new_tokens": max_new_tokens,
+                "min_length": min_length,
+                "min_new_tokens": min_new_tokens,
+                "num_beam_groups": num_beam_groups,
+                "num_beams": num_beams,
+                "penalty_alpha": penalty_alpha,
+                "temperature": temperature,
+                "top_k": top_k,
+                "top_p": top_p,
+                "typical_p": typical_p,
+                "use_cache": use_cache,
+                **(extra_body or {}),
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        response = provider_helper.get_response(response)
+        return response
+
+    async def token_classification(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        aggregation_strategy: Optional["TokenClassificationAggregationStrategy"] = None,
+        ignore_labels: Optional[List[str]] = None,
+        stride: Optional[int] = None,
+    ) -> List[TokenClassificationOutputElement]:
+        """
+        Perform token classification on the given text.
+        Usually used for sentence parsing, either grammatical, or Named Entity Recognition (NER) to understand keywords contained within text.
+
+        Args:
+            text (`str`):
+                A string to be classified.
+            model (`str`, *optional*):
+                The model to use for the token classification task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended token classification model will be used.
+                Defaults to None.
+            aggregation_strategy (`"TokenClassificationAggregationStrategy"`, *optional*):
+                The strategy used to fuse tokens based on model predictions
+            ignore_labels (`List[str`, *optional*):
+                A list of labels to ignore
+            stride (`int`, *optional*):
+                The number of overlapping tokens between chunks when splitting the input text.
+
+        Returns:
+            `List[TokenClassificationOutputElement]`: List of [`TokenClassificationOutputElement`] items containing the entity group, confidence score, word, start and end index.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.token_classification("My name is Sarah Jessica Parker but you can call me Jessica")
+        [
+            TokenClassificationOutputElement(
+                entity_group='PER',
+                score=0.9971321225166321,
+                word='Sarah Jessica Parker',
+                start=11,
+                end=31,
+            ),
+            TokenClassificationOutputElement(
+                entity_group='PER',
+                score=0.9773476123809814,
+                word='Jessica',
+                start=52,
+                end=59,
+            )
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="token-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "aggregation_strategy": aggregation_strategy,
+                "ignore_labels": ignore_labels,
+                "stride": stride,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return TokenClassificationOutputElement.parse_obj_as_list(response)
+
+    async def translation(
+        self,
+        text: str,
+        *,
+        model: Optional[str] = None,
+        src_lang: Optional[str] = None,
+        tgt_lang: Optional[str] = None,
+        clean_up_tokenization_spaces: Optional[bool] = None,
+        truncation: Optional["TranslationTruncationStrategy"] = None,
+        generate_parameters: Optional[Dict[str, Any]] = None,
+    ) -> TranslationOutput:
+        """
+        Convert text from one language to another.
+
+        Check out https://huggingface.co/tasks/translation for more information on how to choose the best model for
+        your specific use case. Source and target languages usually depend on the model.
+        However, it is possible to specify source and target languages for certain models. If you are working with one of these models,
+        you can use `src_lang` and `tgt_lang` arguments to pass the relevant information.
+
+        Args:
+            text (`str`):
+                A string to be translated.
+            model (`str`, *optional*):
+                The model to use for the translation task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended translation model will be used.
+                Defaults to None.
+            src_lang (`str`, *optional*):
+                The source language of the text. Required for models that can translate from multiple languages.
+            tgt_lang (`str`, *optional*):
+                Target language to translate to. Required for models that can translate to multiple languages.
+            clean_up_tokenization_spaces (`bool`, *optional*):
+                Whether to clean up the potential extra spaces in the text output.
+            truncation (`"TranslationTruncationStrategy"`, *optional*):
+                The truncation strategy to use.
+            generate_parameters (`Dict[str, Any]`, *optional*):
+                Additional parametrization of the text generation algorithm.
+
+        Returns:
+            [`TranslationOutput`]: The generated translated text.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+            `ValueError`:
+                If only one of the `src_lang` and `tgt_lang` arguments are provided.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.translation("My name is Wolfgang and I live in Berlin")
+        'Mein Name ist Wolfgang und ich lebe in Berlin.'
+        >>> await client.translation("My name is Wolfgang and I live in Berlin", model="Helsinki-NLP/opus-mt-en-fr")
+        TranslationOutput(translation_text='Je m'appelle Wolfgang et je vis à Berlin.')
+        ```
+
+        Specifying languages:
+        ```py
+        >>> client.translation("My name is Sarah Jessica Parker but you can call me Jessica", model="facebook/mbart-large-50-many-to-many-mmt", src_lang="en_XX", tgt_lang="fr_XX")
+        "Mon nom est Sarah Jessica Parker mais vous pouvez m'appeler Jessica"
+        ```
+        """
+        # Throw error if only one of `src_lang` and `tgt_lang` was given
+        if src_lang is not None and tgt_lang is None:
+            raise ValueError("You cannot specify `src_lang` without specifying `tgt_lang`.")
+
+        if src_lang is None and tgt_lang is not None:
+            raise ValueError("You cannot specify `tgt_lang` without specifying `src_lang`.")
+
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="translation", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "src_lang": src_lang,
+                "tgt_lang": tgt_lang,
+                "clean_up_tokenization_spaces": clean_up_tokenization_spaces,
+                "truncation": truncation,
+                "generate_parameters": generate_parameters,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return TranslationOutput.parse_obj_as_list(response)[0]
+
+    async def visual_question_answering(
+        self,
+        image: ContentT,
+        question: str,
+        *,
+        model: Optional[str] = None,
+        top_k: Optional[int] = None,
+    ) -> List[VisualQuestionAnsweringOutputElement]:
+        """
+        Answering open-ended questions based on an image.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The input image for the context. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            question (`str`):
+                Question to be answered.
+            model (`str`, *optional*):
+                The model to use for the visual question answering task. Can be a model ID hosted on the Hugging Face Hub or a URL to
+                a deployed Inference Endpoint. If not provided, the default recommended visual question answering model will be used.
+                Defaults to None.
+            top_k (`int`, *optional*):
+                The number of answers to return (will be chosen by order of likelihood). Note that we return less than
+                topk answers if there are not enough options available within the context.
+        Returns:
+            `List[VisualQuestionAnsweringOutputElement]`: a list of [`VisualQuestionAnsweringOutputElement`] items containing the predicted label and associated probability.
+
+        Raises:
+            `InferenceTimeoutError`:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.visual_question_answering(
+        ...     image="https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg",
+        ...     question="What is the animal doing?"
+        ... )
+        [
+            VisualQuestionAnsweringOutputElement(score=0.778609573841095, answer='laying down'),
+            VisualQuestionAnsweringOutputElement(score=0.6957435607910156, answer='sitting'),
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="visual-question-answering", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={"top_k": top_k},
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+            extra_payload={"question": question, "image": _b64_encode(image)},
+        )
+        response = await self._inner_post(request_parameters)
+        return VisualQuestionAnsweringOutputElement.parse_obj_as_list(response)
+
+    async def zero_shot_classification(
+        self,
+        text: str,
+        candidate_labels: List[str],
+        *,
+        multi_label: Optional[bool] = False,
+        hypothesis_template: Optional[str] = None,
+        model: Optional[str] = None,
+    ) -> List[ZeroShotClassificationOutputElement]:
+        """
+        Provide as input a text and a set of candidate labels to classify the input text.
+
+        Args:
+            text (`str`):
+                The input text to classify.
+            candidate_labels (`List[str]`):
+                The set of possible class labels to classify the text into.
+            labels (`List[str]`, *optional*):
+                (deprecated) List of strings. Each string is the verbalization of a possible label for the input text.
+            multi_label (`bool`, *optional*):
+                Whether multiple candidate labels can be true. If false, the scores are normalized such that the sum of
+                the label likelihoods for each sequence is 1. If true, the labels are considered independent and
+                probabilities are normalized for each candidate.
+            hypothesis_template (`str`, *optional*):
+                The sentence used in conjunction with `candidate_labels` to attempt the text classification by
+                replacing the placeholder with the candidate labels.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. If not provided, the default recommended zero-shot classification model will be used.
+
+
+        Returns:
+            `List[ZeroShotClassificationOutputElement]`: List of [`ZeroShotClassificationOutputElement`] items containing the predicted labels and their confidence.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example with `multi_label=False`:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> text = (
+        ...     "A new model offers an explanation for how the Galilean satellites formed around the solar system's"
+        ...     "largest world. Konstantin Batygin did not set out to solve one of the solar system's most puzzling"
+        ...     " mysteries when he went for a run up a hill in Nice, France."
+        ... )
+        >>> labels = ["space & cosmos", "scientific discovery", "microbiology", "robots", "archeology"]
+        >>> await client.zero_shot_classification(text, labels)
+        [
+            ZeroShotClassificationOutputElement(label='scientific discovery', score=0.7961668968200684),
+            ZeroShotClassificationOutputElement(label='space & cosmos', score=0.18570658564567566),
+            ZeroShotClassificationOutputElement(label='microbiology', score=0.00730885099619627),
+            ZeroShotClassificationOutputElement(label='archeology', score=0.006258360575884581),
+            ZeroShotClassificationOutputElement(label='robots', score=0.004559356719255447),
+        ]
+        >>> await client.zero_shot_classification(text, labels, multi_label=True)
+        [
+            ZeroShotClassificationOutputElement(label='scientific discovery', score=0.9829297661781311),
+            ZeroShotClassificationOutputElement(label='space & cosmos', score=0.755190908908844),
+            ZeroShotClassificationOutputElement(label='microbiology', score=0.0005462635890580714),
+            ZeroShotClassificationOutputElement(label='archeology', score=0.00047131875180639327),
+            ZeroShotClassificationOutputElement(label='robots', score=0.00030448526376858354),
+        ]
+        ```
+
+        Example with `multi_label=True` and a custom `hypothesis_template`:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+        >>> await client.zero_shot_classification(
+        ...    text="I really like our dinner and I'm very happy. I don't like the weather though.",
+        ...    labels=["positive", "negative", "pessimistic", "optimistic"],
+        ...    multi_label=True,
+        ...    hypothesis_template="This text is {} towards the weather"
+        ... )
+        [
+            ZeroShotClassificationOutputElement(label='negative', score=0.9231801629066467),
+            ZeroShotClassificationOutputElement(label='pessimistic', score=0.8760990500450134),
+            ZeroShotClassificationOutputElement(label='optimistic', score=0.0008674879791215062),
+            ZeroShotClassificationOutputElement(label='positive', score=0.0005250611575320363)
+        ]
+        ```
+        """
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="zero-shot-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=text,
+            parameters={
+                "candidate_labels": candidate_labels,
+                "multi_label": multi_label,
+                "hypothesis_template": hypothesis_template,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        output = _bytes_to_dict(response)
+        return [
+            ZeroShotClassificationOutputElement.parse_obj_as_instance({"label": label, "score": score})
+            for label, score in zip(output["labels"], output["scores"])
+        ]
+
+    async def zero_shot_image_classification(
+        self,
+        image: ContentT,
+        candidate_labels: List[str],
+        *,
+        model: Optional[str] = None,
+        hypothesis_template: Optional[str] = None,
+        # deprecated argument
+        labels: List[str] = None,  # type: ignore
+    ) -> List[ZeroShotImageClassificationOutputElement]:
+        """
+        Provide input image and text labels to predict text labels for the image.
+
+        Args:
+            image (`Union[str, Path, bytes, BinaryIO, PIL.Image.Image]`):
+                The input image to caption. It can be raw bytes, an image file, a URL to an online image, or a PIL Image.
+            candidate_labels (`List[str]`):
+                The candidate labels for this image
+            labels (`List[str]`, *optional*):
+                (deprecated) List of string possible labels. There must be at least 2 labels.
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. If not provided, the default recommended zero-shot image classification model will be used.
+            hypothesis_template (`str`, *optional*):
+                The sentence used in conjunction with `candidate_labels` to attempt the image classification by
+                replacing the placeholder with the candidate labels.
+
+        Returns:
+            `List[ZeroShotImageClassificationOutputElement]`: List of [`ZeroShotImageClassificationOutputElement`] items containing the predicted labels and their confidence.
+
+        Raises:
+            [`InferenceTimeoutError`]:
+                If the model is unavailable or the request times out.
+            `aiohttp.ClientResponseError`:
+                If the request fails with an HTTP error status code other than HTTP 503.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient()
+
+        >>> await client.zero_shot_image_classification(
+        ...     "https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg",
+        ...     labels=["dog", "cat", "horse"],
+        ... )
+        [ZeroShotImageClassificationOutputElement(label='dog', score=0.956),...]
+        ```
+        """
+        # Raise ValueError if input is less than 2 labels
+        if len(candidate_labels) < 2:
+            raise ValueError("You must specify at least 2 classes to compare.")
+
+        model_id = model or self.model
+        provider_helper = get_provider_helper(self.provider, task="zero-shot-image-classification", model=model_id)
+        request_parameters = provider_helper.prepare_request(
+            inputs=image,
+            parameters={
+                "candidate_labels": candidate_labels,
+                "hypothesis_template": hypothesis_template,
+            },
+            headers=self.headers,
+            model=model_id,
+            api_key=self.token,
+        )
+        response = await self._inner_post(request_parameters)
+        return ZeroShotImageClassificationOutputElement.parse_obj_as_list(response)
+
+    def _get_client_session(self, headers: Optional[Dict] = None) -> "ClientSession":
+        aiohttp = _import_aiohttp()
+        client_headers = self.headers.copy()
+        if headers is not None:
+            client_headers.update(headers)
+
+        # Return a new aiohttp ClientSession with correct settings.
+        session = aiohttp.ClientSession(
+            headers=client_headers,
+            cookies=self.cookies,
+            timeout=aiohttp.ClientTimeout(self.timeout),
+            trust_env=self.trust_env,
+        )
+
+        # Keep track of sessions to close them later
+        self._sessions[session] = set()
+
+        # Override the `._request` method to register responses to be closed
+        session._wrapped_request = session._request
+
+        async def _request(method, url, **kwargs):
+            response = await session._wrapped_request(method, url, **kwargs)
+            self._sessions[session].add(response)
+            return response
+
+        session._request = _request
+
+        # Override the 'close' method to
+        # 1. close ongoing responses
+        # 2. deregister the session when closed
+        session._close = session.close
+
+        async def close_session():
+            for response in self._sessions[session]:
+                response.close()
+            await session._close()
+            self._sessions.pop(session, None)
+
+        session.close = close_session
+        return session
+
+    async def get_endpoint_info(self, *, model: Optional[str] = None) -> Dict[str, Any]:
+        """
+        Get information about the deployed endpoint.
+
+        This endpoint is only available on endpoints powered by Text-Generation-Inference (TGI) or Text-Embedding-Inference (TEI).
+        Endpoints powered by `transformers` return an empty payload.
+
+        Args:
+            model (`str`, *optional*):
+                The model to use for inference. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
+                Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+
+        Returns:
+            `Dict[str, Any]`: Information about the endpoint.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
+        >>> await client.get_endpoint_info()
+        {
+            'model_id': 'meta-llama/Meta-Llama-3-70B-Instruct',
+            'model_sha': None,
+            'model_dtype': 'torch.float16',
+            'model_device_type': 'cuda',
+            'model_pipeline_tag': None,
+            'max_concurrent_requests': 128,
+            'max_best_of': 2,
+            'max_stop_sequences': 4,
+            'max_input_length': 8191,
+            'max_total_tokens': 8192,
+            'waiting_served_ratio': 0.3,
+            'max_batch_total_tokens': 1259392,
+            'max_waiting_tokens': 20,
+            'max_batch_size': None,
+            'validation_workers': 32,
+            'max_client_batch_size': 4,
+            'version': '2.0.2',
+            'sha': 'dccab72549635c7eb5ddb17f43f0b7cdff07c214',
+            'docker_label': 'sha-dccab72'
+        }
+        ```
+        """
+        if self.provider != "hf-inference":
+            raise ValueError(f"Getting endpoint info is not supported on '{self.provider}'.")
+
+        model = model or self.model
+        if model is None:
+            raise ValueError("Model id not provided.")
+        if model.startswith(("http://", "https://")):
+            url = model.rstrip("/") + "/info"
+        else:
+            url = f"{constants.INFERENCE_ENDPOINT}/models/{model}/info"
+
+        async with self._get_client_session(headers=build_hf_headers(token=self.token)) as client:
+            response = await client.get(url, proxy=self.proxies)
+            response.raise_for_status()
+            return await response.json()
+
+    async def health_check(self, model: Optional[str] = None) -> bool:
+        """
+        Check the health of the deployed endpoint.
+
+        Health check is only available with Inference Endpoints powered by Text-Generation-Inference (TGI) or Text-Embedding-Inference (TEI).
+
+        Args:
+            model (`str`, *optional*):
+                URL of the Inference Endpoint. This parameter overrides the model defined at the instance level. Defaults to None.
+
+        Returns:
+            `bool`: True if everything is working fine.
+
+        Example:
+        ```py
+        # Must be run in an async context
+        >>> from huggingface_hub import AsyncInferenceClient
+        >>> client = AsyncInferenceClient("https://jzgu0buei5.us-east-1.aws.endpoints.huggingface.cloud")
+        >>> await client.health_check()
+        True
+        ```
+        """
+        if self.provider != "hf-inference":
+            raise ValueError(f"Health check is not supported on '{self.provider}'.")
+
+        model = model or self.model
+        if model is None:
+            raise ValueError("Model id not provided.")
+        if not model.startswith(("http://", "https://")):
+            raise ValueError("Model must be an Inference Endpoint URL.")
+        url = model.rstrip("/") + "/health"
+
+        async with self._get_client_session(headers=build_hf_headers(token=self.token)) as client:
+            response = await client.get(url, proxy=self.proxies)
+            return response.status == 200
+
+    @property
+    def chat(self) -> "ProxyClientChat":
+        return ProxyClientChat(self)
+
+
+class _ProxyClient:
+    """Proxy class to be able to call `client.chat.completion.create(...)` as OpenAI client."""
+
+    def __init__(self, client: AsyncInferenceClient):
+        self._client = client
+
+
+class ProxyClientChat(_ProxyClient):
+    """Proxy class to be able to call `client.chat.completion.create(...)` as OpenAI client."""
+
+    @property
+    def completions(self) -> "ProxyClientChatCompletions":
+        return ProxyClientChatCompletions(self._client)
+
+
+class ProxyClientChatCompletions(_ProxyClient):
+    """Proxy class to be able to call `client.chat.completion.create(...)` as OpenAI client."""
+
+    @property
+    def create(self):
+        return self._client.chat_completion
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..bfffc0ae3bce71532382ee87d03c40dc376cfae7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__init__.py
@@ -0,0 +1,192 @@
+# This file is auto-generated by `utils/generate_inference_types.py`.
+# Do not modify it manually.
+#
+# ruff: noqa: F401
+
+from .audio_classification import (
+    AudioClassificationInput,
+    AudioClassificationOutputElement,
+    AudioClassificationOutputTransform,
+    AudioClassificationParameters,
+)
+from .audio_to_audio import AudioToAudioInput, AudioToAudioOutputElement
+from .automatic_speech_recognition import (
+    AutomaticSpeechRecognitionEarlyStoppingEnum,
+    AutomaticSpeechRecognitionGenerationParameters,
+    AutomaticSpeechRecognitionInput,
+    AutomaticSpeechRecognitionOutput,
+    AutomaticSpeechRecognitionOutputChunk,
+    AutomaticSpeechRecognitionParameters,
+)
+from .base import BaseInferenceType
+from .chat_completion import (
+    ChatCompletionInput,
+    ChatCompletionInputFunctionDefinition,
+    ChatCompletionInputFunctionName,
+    ChatCompletionInputGrammarType,
+    ChatCompletionInputJSONSchema,
+    ChatCompletionInputMessage,
+    ChatCompletionInputMessageChunk,
+    ChatCompletionInputMessageChunkType,
+    ChatCompletionInputResponseFormatJSONObject,
+    ChatCompletionInputResponseFormatJSONSchema,
+    ChatCompletionInputResponseFormatText,
+    ChatCompletionInputStreamOptions,
+    ChatCompletionInputTool,
+    ChatCompletionInputToolCall,
+    ChatCompletionInputToolChoiceClass,
+    ChatCompletionInputToolChoiceEnum,
+    ChatCompletionInputURL,
+    ChatCompletionOutput,
+    ChatCompletionOutputComplete,
+    ChatCompletionOutputFunctionDefinition,
+    ChatCompletionOutputLogprob,
+    ChatCompletionOutputLogprobs,
+    ChatCompletionOutputMessage,
+    ChatCompletionOutputToolCall,
+    ChatCompletionOutputTopLogprob,
+    ChatCompletionOutputUsage,
+    ChatCompletionStreamOutput,
+    ChatCompletionStreamOutputChoice,
+    ChatCompletionStreamOutputDelta,
+    ChatCompletionStreamOutputDeltaToolCall,
+    ChatCompletionStreamOutputFunction,
+    ChatCompletionStreamOutputLogprob,
+    ChatCompletionStreamOutputLogprobs,
+    ChatCompletionStreamOutputTopLogprob,
+    ChatCompletionStreamOutputUsage,
+)
+from .depth_estimation import DepthEstimationInput, DepthEstimationOutput
+from .document_question_answering import (
+    DocumentQuestionAnsweringInput,
+    DocumentQuestionAnsweringInputData,
+    DocumentQuestionAnsweringOutputElement,
+    DocumentQuestionAnsweringParameters,
+)
+from .feature_extraction import FeatureExtractionInput, FeatureExtractionInputTruncationDirection
+from .fill_mask import FillMaskInput, FillMaskOutputElement, FillMaskParameters
+from .image_classification import (
+    ImageClassificationInput,
+    ImageClassificationOutputElement,
+    ImageClassificationOutputTransform,
+    ImageClassificationParameters,
+)
+from .image_segmentation import (
+    ImageSegmentationInput,
+    ImageSegmentationOutputElement,
+    ImageSegmentationParameters,
+    ImageSegmentationSubtask,
+)
+from .image_to_image import ImageToImageInput, ImageToImageOutput, ImageToImageParameters, ImageToImageTargetSize
+from .image_to_text import (
+    ImageToTextEarlyStoppingEnum,
+    ImageToTextGenerationParameters,
+    ImageToTextInput,
+    ImageToTextOutput,
+    ImageToTextParameters,
+)
+from .image_to_video import ImageToVideoInput, ImageToVideoOutput, ImageToVideoParameters, ImageToVideoTargetSize
+from .object_detection import (
+    ObjectDetectionBoundingBox,
+    ObjectDetectionInput,
+    ObjectDetectionOutputElement,
+    ObjectDetectionParameters,
+)
+from .question_answering import (
+    QuestionAnsweringInput,
+    QuestionAnsweringInputData,
+    QuestionAnsweringOutputElement,
+    QuestionAnsweringParameters,
+)
+from .sentence_similarity import SentenceSimilarityInput, SentenceSimilarityInputData
+from .summarization import (
+    SummarizationInput,
+    SummarizationOutput,
+    SummarizationParameters,
+    SummarizationTruncationStrategy,
+)
+from .table_question_answering import (
+    Padding,
+    TableQuestionAnsweringInput,
+    TableQuestionAnsweringInputData,
+    TableQuestionAnsweringOutputElement,
+    TableQuestionAnsweringParameters,
+)
+from .text2text_generation import (
+    Text2TextGenerationInput,
+    Text2TextGenerationOutput,
+    Text2TextGenerationParameters,
+    Text2TextGenerationTruncationStrategy,
+)
+from .text_classification import (
+    TextClassificationInput,
+    TextClassificationOutputElement,
+    TextClassificationOutputTransform,
+    TextClassificationParameters,
+)
+from .text_generation import (
+    TextGenerationInput,
+    TextGenerationInputGenerateParameters,
+    TextGenerationInputGrammarType,
+    TextGenerationOutput,
+    TextGenerationOutputBestOfSequence,
+    TextGenerationOutputDetails,
+    TextGenerationOutputFinishReason,
+    TextGenerationOutputPrefillToken,
+    TextGenerationOutputToken,
+    TextGenerationStreamOutput,
+    TextGenerationStreamOutputStreamDetails,
+    TextGenerationStreamOutputToken,
+    TypeEnum,
+)
+from .text_to_audio import (
+    TextToAudioEarlyStoppingEnum,
+    TextToAudioGenerationParameters,
+    TextToAudioInput,
+    TextToAudioOutput,
+    TextToAudioParameters,
+)
+from .text_to_image import TextToImageInput, TextToImageOutput, TextToImageParameters
+from .text_to_speech import (
+    TextToSpeechEarlyStoppingEnum,
+    TextToSpeechGenerationParameters,
+    TextToSpeechInput,
+    TextToSpeechOutput,
+    TextToSpeechParameters,
+)
+from .text_to_video import TextToVideoInput, TextToVideoOutput, TextToVideoParameters
+from .token_classification import (
+    TokenClassificationAggregationStrategy,
+    TokenClassificationInput,
+    TokenClassificationOutputElement,
+    TokenClassificationParameters,
+)
+from .translation import TranslationInput, TranslationOutput, TranslationParameters, TranslationTruncationStrategy
+from .video_classification import (
+    VideoClassificationInput,
+    VideoClassificationOutputElement,
+    VideoClassificationOutputTransform,
+    VideoClassificationParameters,
+)
+from .visual_question_answering import (
+    VisualQuestionAnsweringInput,
+    VisualQuestionAnsweringInputData,
+    VisualQuestionAnsweringOutputElement,
+    VisualQuestionAnsweringParameters,
+)
+from .zero_shot_classification import (
+    ZeroShotClassificationInput,
+    ZeroShotClassificationOutputElement,
+    ZeroShotClassificationParameters,
+)
+from .zero_shot_image_classification import (
+    ZeroShotImageClassificationInput,
+    ZeroShotImageClassificationOutputElement,
+    ZeroShotImageClassificationParameters,
+)
+from .zero_shot_object_detection import (
+    ZeroShotObjectDetectionBoundingBox,
+    ZeroShotObjectDetectionInput,
+    ZeroShotObjectDetectionOutputElement,
+    ZeroShotObjectDetectionParameters,
+)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a7060a9612fa41f2772a765466749ceab7d712cf
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/audio_classification.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/audio_classification.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..9a87086c496413c9db7a56ff5123b1fe8bfb09d6
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/audio_classification.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/audio_to_audio.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/audio_to_audio.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..69509c438d923af263f13183ed27bd578165c708
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/audio_to_audio.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/automatic_speech_recognition.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/automatic_speech_recognition.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..67d87164dc470c65a3915730364bb8ef16f3bcf7
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/automatic_speech_recognition.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/base.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/base.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..61bb43614ae2d0e6b92d5c9d67ccb01100926eca
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/base.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/chat_completion.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/chat_completion.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a99c726bd506c7aea709ef634dff765cbe80db67
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/chat_completion.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/depth_estimation.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/depth_estimation.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..8e7d87da7183907e8a7fedbd08b69e2ace9b59d3
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/depth_estimation.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/document_question_answering.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/document_question_answering.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c69244c7518ed75c61740d581349138a2ef17070
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/document_question_answering.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/feature_extraction.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/feature_extraction.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..29e9eb117de95c772b21e1148252195bade42fd3
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/feature_extraction.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/fill_mask.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/fill_mask.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..d49e561d499ca2482c54cb60ccd1e3277b4a52fc
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/fill_mask.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_classification.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_classification.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..75bb2492e9aa57bd6dc641420f073b17b84b11d7
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_classification.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_segmentation.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_segmentation.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..4f998e6dc91eb6af590cd7dd791938c25f66cff3
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_segmentation.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_to_image.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_to_image.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..889b7ae3e2c094b82600c22ad2ef7aa0bfa06d5b
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_to_image.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_to_text.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_to_text.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a206475769f2669f1f7fa22cdb7c8bd9d2744f9a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_to_text.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_to_video.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_to_video.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..67fd31b2b8d4f4ef4dea48a1d1f50b04f0c28bd7
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/image_to_video.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/object_detection.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/object_detection.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..bf63e8dd5da9b0481160b8c6c661d90cd9c280cf
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/object_detection.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/question_answering.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/question_answering.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..72867480fada25017052000a780919e5e8223088
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/question_answering.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/sentence_similarity.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/sentence_similarity.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ed13719ccdd3d47993595e355708a634f7627423
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/sentence_similarity.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/summarization.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/summarization.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..4d9512167755abd046acc9c119b95ab867d261ca
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/summarization.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/table_question_answering.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/table_question_answering.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3d17a0c0bc6353ab56634a335642b7da38bfd8e5
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/table_question_answering.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text2text_generation.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text2text_generation.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..7df9acf62d4fb85f95ec63c96b9f64550b4c72c2
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text2text_generation.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_classification.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_classification.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..0b489a2d1e2bc40e19f15215ab1e4d2e202afd0f
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_classification.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_generation.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_generation.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..8b2254cdf43f0fe7947cf559ad9fe1d3539b5364
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_generation.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_audio.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_audio.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5b68f80c03985725ef647da74517c66890a485de
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_audio.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_image.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_image.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..d99a743ef1c5f839a5fc0534a68e842743c000fb
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_image.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_speech.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_speech.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..631cdfdb2ef612da666bb7d7b5910391003042e9
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_speech.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_video.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_video.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..46f2ea88637071283b8b196c478c8817213296e1
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/text_to_video.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/token_classification.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/token_classification.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..95ff3b9342dd855be7dddc184fc7568db9b9f586
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/token_classification.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/translation.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/translation.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a6bf982dfd90956b6fa12406b9481ea51351fdb2
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/translation.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/video_classification.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/video_classification.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3971e7c057a2438c08ca2322926b018450316a0f
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/video_classification.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/visual_question_answering.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/visual_question_answering.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5b641033f6b7d83e16dd3363ce6f18eb0cd36e45
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/visual_question_answering.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/zero_shot_classification.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/zero_shot_classification.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..d5672947ca5a65a4f4dfbf4483ce3b4857b46443
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/zero_shot_classification.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/zero_shot_image_classification.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/zero_shot_image_classification.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..51cfc2bceba27488049f3a8c2134cd1ab14c3598
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/zero_shot_image_classification.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/zero_shot_object_detection.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/zero_shot_object_detection.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..67d77b73b66d9581260dd5f5dc6b9789a8db2e0f
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/__pycache__/zero_shot_object_detection.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/audio_classification.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/audio_classification.py
new file mode 100644
index 0000000000000000000000000000000000000000..053055787bce933e1fbd393cfbc00d81c43c8c2d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/audio_classification.py
@@ -0,0 +1,43 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+AudioClassificationOutputTransform = Literal["sigmoid", "softmax", "none"]
+
+
+@dataclass_with_extra
+class AudioClassificationParameters(BaseInferenceType):
+    """Additional inference parameters for Audio Classification"""
+
+    function_to_apply: Optional["AudioClassificationOutputTransform"] = None
+    """The function to apply to the model outputs in order to retrieve the scores."""
+    top_k: Optional[int] = None
+    """When specified, limits the output to the top K most probable classes."""
+
+
+@dataclass_with_extra
+class AudioClassificationInput(BaseInferenceType):
+    """Inputs for Audio Classification inference"""
+
+    inputs: str
+    """The input audio data as a base64-encoded string. If no `parameters` are provided, you can
+    also provide the audio data as a raw bytes payload.
+    """
+    parameters: Optional[AudioClassificationParameters] = None
+    """Additional inference parameters for Audio Classification"""
+
+
+@dataclass_with_extra
+class AudioClassificationOutputElement(BaseInferenceType):
+    """Outputs for Audio Classification inference"""
+
+    label: str
+    """The predicted class label."""
+    score: float
+    """The corresponding probability."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/audio_to_audio.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/audio_to_audio.py
new file mode 100644
index 0000000000000000000000000000000000000000..43f376b5345fab6b854b028d1c17416c020d7bc1
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/audio_to_audio.py
@@ -0,0 +1,30 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class AudioToAudioInput(BaseInferenceType):
+    """Inputs for Audio to Audio inference"""
+
+    inputs: Any
+    """The input audio data"""
+
+
+@dataclass_with_extra
+class AudioToAudioOutputElement(BaseInferenceType):
+    """Outputs of inference for the Audio To Audio task
+    A generated audio file with its label.
+    """
+
+    blob: Any
+    """The generated audio file."""
+    content_type: str
+    """The content type of audio file."""
+    label: str
+    """The label of the audio file."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/automatic_speech_recognition.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/automatic_speech_recognition.py
new file mode 100644
index 0000000000000000000000000000000000000000..f6bfd28256c82309b160f337aba5a54e2dd11872
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/automatic_speech_recognition.py
@@ -0,0 +1,113 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import List, Literal, Optional, Union
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+AutomaticSpeechRecognitionEarlyStoppingEnum = Literal["never"]
+
+
+@dataclass_with_extra
+class AutomaticSpeechRecognitionGenerationParameters(BaseInferenceType):
+    """Parametrization of the text generation process"""
+
+    do_sample: Optional[bool] = None
+    """Whether to use sampling instead of greedy decoding when generating new tokens."""
+    early_stopping: Optional[Union[bool, "AutomaticSpeechRecognitionEarlyStoppingEnum"]] = None
+    """Controls the stopping condition for beam-based methods."""
+    epsilon_cutoff: Optional[float] = None
+    """If set to float strictly between 0 and 1, only tokens with a conditional probability
+    greater than epsilon_cutoff will be sampled. In the paper, suggested values range from
+    3e-4 to 9e-4, depending on the size of the model. See [Truncation Sampling as Language
+    Model Desmoothing](https://hf.co/papers/2210.15191) for more details.
+    """
+    eta_cutoff: Optional[float] = None
+    """Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to
+    float strictly between 0 and 1, a token is only considered if it is greater than either
+    eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter
+    term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In
+    the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model.
+    See [Truncation Sampling as Language Model Desmoothing](https://hf.co/papers/2210.15191)
+    for more details.
+    """
+    max_length: Optional[int] = None
+    """The maximum length (in tokens) of the generated text, including the input."""
+    max_new_tokens: Optional[int] = None
+    """The maximum number of tokens to generate. Takes precedence over max_length."""
+    min_length: Optional[int] = None
+    """The minimum length (in tokens) of the generated text, including the input."""
+    min_new_tokens: Optional[int] = None
+    """The minimum number of tokens to generate. Takes precedence over min_length."""
+    num_beam_groups: Optional[int] = None
+    """Number of groups to divide num_beams into in order to ensure diversity among different
+    groups of beams. See [this paper](https://hf.co/papers/1610.02424) for more details.
+    """
+    num_beams: Optional[int] = None
+    """Number of beams to use for beam search."""
+    penalty_alpha: Optional[float] = None
+    """The value balances the model confidence and the degeneration penalty in contrastive
+    search decoding.
+    """
+    temperature: Optional[float] = None
+    """The value used to modulate the next token probabilities."""
+    top_k: Optional[int] = None
+    """The number of highest probability vocabulary tokens to keep for top-k-filtering."""
+    top_p: Optional[float] = None
+    """If set to float < 1, only the smallest set of most probable tokens with probabilities
+    that add up to top_p or higher are kept for generation.
+    """
+    typical_p: Optional[float] = None
+    """Local typicality measures how similar the conditional probability of predicting a target
+    token next is to the expected conditional probability of predicting a random token next,
+    given the partial text already generated. If set to float < 1, the smallest set of the
+    most locally typical tokens with probabilities that add up to typical_p or higher are
+    kept for generation. See [this paper](https://hf.co/papers/2202.00666) for more details.
+    """
+    use_cache: Optional[bool] = None
+    """Whether the model should use the past last key/values attentions to speed up decoding"""
+
+
+@dataclass_with_extra
+class AutomaticSpeechRecognitionParameters(BaseInferenceType):
+    """Additional inference parameters for Automatic Speech Recognition"""
+
+    generation_parameters: Optional[AutomaticSpeechRecognitionGenerationParameters] = None
+    """Parametrization of the text generation process"""
+    return_timestamps: Optional[bool] = None
+    """Whether to output corresponding timestamps with the generated text"""
+
+
+@dataclass_with_extra
+class AutomaticSpeechRecognitionInput(BaseInferenceType):
+    """Inputs for Automatic Speech Recognition inference"""
+
+    inputs: str
+    """The input audio data as a base64-encoded string. If no `parameters` are provided, you can
+    also provide the audio data as a raw bytes payload.
+    """
+    parameters: Optional[AutomaticSpeechRecognitionParameters] = None
+    """Additional inference parameters for Automatic Speech Recognition"""
+
+
+@dataclass_with_extra
+class AutomaticSpeechRecognitionOutputChunk(BaseInferenceType):
+    text: str
+    """A chunk of text identified by the model"""
+    timestamp: List[float]
+    """The start and end timestamps corresponding with the text"""
+
+
+@dataclass_with_extra
+class AutomaticSpeechRecognitionOutput(BaseInferenceType):
+    """Outputs of inference for the Automatic Speech Recognition task"""
+
+    text: str
+    """The recognized text."""
+    chunks: Optional[List[AutomaticSpeechRecognitionOutputChunk]] = None
+    """When returnTimestamps is enabled, chunks contains a list of audio chunks identified by
+    the model.
+    """
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/base.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/base.py
new file mode 100644
index 0000000000000000000000000000000000000000..1f0c4687ceccbfb738da3f38c583c2516d065a01
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/base.py
@@ -0,0 +1,161 @@
+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains a base class for all inference types."""
+
+import inspect
+import json
+from dataclasses import asdict, dataclass
+from typing import Any, Dict, List, Type, TypeVar, Union, get_args
+
+
+T = TypeVar("T", bound="BaseInferenceType")
+
+
+def _repr_with_extra(self):
+    fields = list(self.__dataclass_fields__.keys())
+    other_fields = list(k for k in self.__dict__ if k not in fields)
+    return f"{self.__class__.__name__}({', '.join(f'{k}={self.__dict__[k]!r}' for k in fields + other_fields)})"
+
+
+def dataclass_with_extra(cls: Type[T]) -> Type[T]:
+    """Decorator to add a custom __repr__ method to a dataclass, showing all fields, including extra ones.
+
+    This decorator only works with dataclasses that inherit from `BaseInferenceType`.
+    """
+    cls = dataclass(cls)
+    cls.__repr__ = _repr_with_extra  # type: ignore[method-assign]
+    return cls
+
+
+@dataclass
+class BaseInferenceType(dict):
+    """Base class for all inference types.
+
+    Object is a dataclass and a dict for backward compatibility but plan is to remove the dict part in the future.
+
+    Handle parsing from dict, list and json strings in a permissive way to ensure future-compatibility (e.g. all fields
+    are made optional, and non-expected fields are added as dict attributes).
+    """
+
+    @classmethod
+    def parse_obj_as_list(cls: Type[T], data: Union[bytes, str, List, Dict]) -> List[T]:
+        """Alias to parse server response and return a single instance.
+
+        See `parse_obj` for more details.
+        """
+        output = cls.parse_obj(data)
+        if not isinstance(output, list):
+            raise ValueError(f"Invalid input data for {cls}. Expected a list, but got {type(output)}.")
+        return output
+
+    @classmethod
+    def parse_obj_as_instance(cls: Type[T], data: Union[bytes, str, List, Dict]) -> T:
+        """Alias to parse server response and return a single instance.
+
+        See `parse_obj` for more details.
+        """
+        output = cls.parse_obj(data)
+        if isinstance(output, list):
+            raise ValueError(f"Invalid input data for {cls}. Expected a single instance, but got a list.")
+        return output
+
+    @classmethod
+    def parse_obj(cls: Type[T], data: Union[bytes, str, List, Dict]) -> Union[List[T], T]:
+        """Parse server response as a dataclass or list of dataclasses.
+
+        To enable future-compatibility, we want to handle cases where the server return more fields than expected.
+        In such cases, we don't want to raise an error but still create the dataclass object. Remaining fields are
+        added as dict attributes.
+        """
+        # Parse server response (from bytes)
+        if isinstance(data, bytes):
+            data = data.decode()
+        if isinstance(data, str):
+            data = json.loads(data)
+
+        # If a list, parse each item individually
+        if isinstance(data, List):
+            return [cls.parse_obj(d) for d in data]  # type: ignore [misc]
+
+        # At this point, we expect a dict
+        if not isinstance(data, dict):
+            raise ValueError(f"Invalid data type: {type(data)}")
+
+        init_values = {}
+        other_values = {}
+        for key, value in data.items():
+            key = normalize_key(key)
+            if key in cls.__dataclass_fields__ and cls.__dataclass_fields__[key].init:
+                if isinstance(value, dict) or isinstance(value, list):
+                    field_type = cls.__dataclass_fields__[key].type
+
+                    # if `field_type` is a `BaseInferenceType`, parse it
+                    if inspect.isclass(field_type) and issubclass(field_type, BaseInferenceType):
+                        value = field_type.parse_obj(value)
+
+                    # otherwise, recursively parse nested dataclasses (if possible)
+                    # `get_args` returns handle Union and Optional for us
+                    else:
+                        expected_types = get_args(field_type)
+                        for expected_type in expected_types:
+                            if getattr(expected_type, "_name", None) == "List":
+                                expected_type = get_args(expected_type)[
+                                    0
+                                ]  # assume same type for all items in the list
+                            if inspect.isclass(expected_type) and issubclass(expected_type, BaseInferenceType):
+                                value = expected_type.parse_obj(value)
+                                break
+                init_values[key] = value
+            else:
+                other_values[key] = value
+
+        # Make all missing fields default to None
+        # => ensure that dataclass initialization will never fail even if the server does not return all fields.
+        for key in cls.__dataclass_fields__:
+            if key not in init_values:
+                init_values[key] = None
+
+        # Initialize dataclass with expected values
+        item = cls(**init_values)
+
+        # Add remaining fields as dict attributes
+        item.update(other_values)
+
+        # Add remaining fields as extra dataclass fields.
+        # They won't be part of the dataclass fields but will be accessible as attributes.
+        # Use @dataclass_with_extra to show them in __repr__.
+        item.__dict__.update(other_values)
+        return item
+
+    def __post_init__(self):
+        self.update(asdict(self))
+
+    def __setitem__(self, __key: Any, __value: Any) -> None:
+        # Hacky way to keep dataclass values in sync when dict is updated
+        super().__setitem__(__key, __value)
+        if __key in self.__dataclass_fields__ and getattr(self, __key, None) != __value:
+            self.__setattr__(__key, __value)
+        return
+
+    def __setattr__(self, __name: str, __value: Any) -> None:
+        # Hacky way to keep dict values is sync when dataclass is updated
+        super().__setattr__(__name, __value)
+        if self.get(__name) != __value:
+            self[__name] = __value
+        return
+
+
+def normalize_key(key: str) -> str:
+    # e.g "content-type" -> "content_type", "Accept" -> "accept"
+    return key.replace("-", "_").replace(" ", "_").lower()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/chat_completion.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/chat_completion.py
new file mode 100644
index 0000000000000000000000000000000000000000..ba708a7009bf14cd182a999ccf95f07ee2a002b8
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/chat_completion.py
@@ -0,0 +1,347 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Dict, List, Literal, Optional, Union
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class ChatCompletionInputURL(BaseInferenceType):
+    url: str
+
+
+ChatCompletionInputMessageChunkType = Literal["text", "image_url"]
+
+
+@dataclass_with_extra
+class ChatCompletionInputMessageChunk(BaseInferenceType):
+    type: "ChatCompletionInputMessageChunkType"
+    image_url: Optional[ChatCompletionInputURL] = None
+    text: Optional[str] = None
+
+
+@dataclass_with_extra
+class ChatCompletionInputFunctionDefinition(BaseInferenceType):
+    name: str
+    parameters: Any
+    description: Optional[str] = None
+
+
+@dataclass_with_extra
+class ChatCompletionInputToolCall(BaseInferenceType):
+    function: ChatCompletionInputFunctionDefinition
+    id: str
+    type: str
+
+
+@dataclass_with_extra
+class ChatCompletionInputMessage(BaseInferenceType):
+    role: str
+    content: Optional[Union[List[ChatCompletionInputMessageChunk], str]] = None
+    name: Optional[str] = None
+    tool_calls: Optional[List[ChatCompletionInputToolCall]] = None
+
+
+@dataclass_with_extra
+class ChatCompletionInputJSONSchema(BaseInferenceType):
+    name: str
+    """
+    The name of the response format.
+    """
+    description: Optional[str] = None
+    """
+    A description of what the response format is for, used by the model to determine
+    how to respond in the format.
+    """
+    schema: Optional[Dict[str, object]] = None
+    """
+    The schema for the response format, described as a JSON Schema object. Learn how
+    to build JSON schemas [here](https://json-schema.org/).
+    """
+    strict: Optional[bool] = None
+    """
+    Whether to enable strict schema adherence when generating the output. If set to
+    true, the model will always follow the exact schema defined in the `schema`
+    field.
+    """
+
+
+@dataclass_with_extra
+class ChatCompletionInputResponseFormatText(BaseInferenceType):
+    type: Literal["text"]
+
+
+@dataclass_with_extra
+class ChatCompletionInputResponseFormatJSONSchema(BaseInferenceType):
+    type: Literal["json_schema"]
+    json_schema: ChatCompletionInputJSONSchema
+
+
+@dataclass_with_extra
+class ChatCompletionInputResponseFormatJSONObject(BaseInferenceType):
+    type: Literal["json_object"]
+
+
+ChatCompletionInputGrammarType = Union[
+    ChatCompletionInputResponseFormatText,
+    ChatCompletionInputResponseFormatJSONSchema,
+    ChatCompletionInputResponseFormatJSONObject,
+]
+
+
+@dataclass_with_extra
+class ChatCompletionInputStreamOptions(BaseInferenceType):
+    include_usage: Optional[bool] = None
+    """If set, an additional chunk will be streamed before the data: [DONE] message. The usage
+    field on this chunk shows the token usage statistics for the entire request, and the
+    choices field will always be an empty array. All other chunks will also include a usage
+    field, but with a null value.
+    """
+
+
+@dataclass_with_extra
+class ChatCompletionInputFunctionName(BaseInferenceType):
+    name: str
+
+
+@dataclass_with_extra
+class ChatCompletionInputToolChoiceClass(BaseInferenceType):
+    function: ChatCompletionInputFunctionName
+
+
+ChatCompletionInputToolChoiceEnum = Literal["auto", "none", "required"]
+
+
+@dataclass_with_extra
+class ChatCompletionInputTool(BaseInferenceType):
+    function: ChatCompletionInputFunctionDefinition
+    type: str
+
+
+@dataclass_with_extra
+class ChatCompletionInput(BaseInferenceType):
+    """Chat Completion Input.
+    Auto-generated from TGI specs.
+    For more details, check out
+    https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-tgi-import.ts.
+    """
+
+    messages: List[ChatCompletionInputMessage]
+    """A list of messages comprising the conversation so far."""
+    frequency_penalty: Optional[float] = None
+    """Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing
+    frequency in the text so far,
+    decreasing the model's likelihood to repeat the same line verbatim.
+    """
+    logit_bias: Optional[List[float]] = None
+    """UNUSED
+    Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON
+    object that maps tokens
+    (specified by their token ID in the tokenizer) to an associated bias value from -100 to
+    100. Mathematically,
+    the bias is added to the logits generated by the model prior to sampling. The exact
+    effect will vary per model,
+    but values between -1 and 1 should decrease or increase likelihood of selection; values
+    like -100 or 100 should
+    result in a ban or exclusive selection of the relevant token.
+    """
+    logprobs: Optional[bool] = None
+    """Whether to return log probabilities of the output tokens or not. If true, returns the log
+    probabilities of each
+    output token returned in the content of message.
+    """
+    max_tokens: Optional[int] = None
+    """The maximum number of tokens that can be generated in the chat completion."""
+    model: Optional[str] = None
+    """[UNUSED] ID of the model to use. See the model endpoint compatibility table for details
+    on which models work with the Chat API.
+    """
+    n: Optional[int] = None
+    """UNUSED
+    How many chat completion choices to generate for each input message. Note that you will
+    be charged based on the
+    number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
+    """
+    presence_penalty: Optional[float] = None
+    """Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they
+    appear in the text so far,
+    increasing the model's likelihood to talk about new topics
+    """
+    response_format: Optional[ChatCompletionInputGrammarType] = None
+    seed: Optional[int] = None
+    stop: Optional[List[str]] = None
+    """Up to 4 sequences where the API will stop generating further tokens."""
+    stream: Optional[bool] = None
+    stream_options: Optional[ChatCompletionInputStreamOptions] = None
+    temperature: Optional[float] = None
+    """What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the
+    output more random, while
+    lower values like 0.2 will make it more focused and deterministic.
+    We generally recommend altering this or `top_p` but not both.
+    """
+    tool_choice: Optional[Union[ChatCompletionInputToolChoiceClass, "ChatCompletionInputToolChoiceEnum"]] = None
+    tool_prompt: Optional[str] = None
+    """A prompt to be appended before the tools"""
+    tools: Optional[List[ChatCompletionInputTool]] = None
+    """A list of tools the model may call. Currently, only functions are supported as a tool.
+    Use this to provide a list of
+    functions the model may generate JSON inputs for.
+    """
+    top_logprobs: Optional[int] = None
+    """An integer between 0 and 5 specifying the number of most likely tokens to return at each
+    token position, each with
+    an associated log probability. logprobs must be set to true if this parameter is used.
+    """
+    top_p: Optional[float] = None
+    """An alternative to sampling with temperature, called nucleus sampling, where the model
+    considers the results of the
+    tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10%
+    probability mass are considered.
+    """
+
+
+@dataclass_with_extra
+class ChatCompletionOutputTopLogprob(BaseInferenceType):
+    logprob: float
+    token: str
+
+
+@dataclass_with_extra
+class ChatCompletionOutputLogprob(BaseInferenceType):
+    logprob: float
+    token: str
+    top_logprobs: List[ChatCompletionOutputTopLogprob]
+
+
+@dataclass_with_extra
+class ChatCompletionOutputLogprobs(BaseInferenceType):
+    content: List[ChatCompletionOutputLogprob]
+
+
+@dataclass_with_extra
+class ChatCompletionOutputFunctionDefinition(BaseInferenceType):
+    arguments: str
+    name: str
+    description: Optional[str] = None
+
+
+@dataclass_with_extra
+class ChatCompletionOutputToolCall(BaseInferenceType):
+    function: ChatCompletionOutputFunctionDefinition
+    id: str
+    type: str
+
+
+@dataclass_with_extra
+class ChatCompletionOutputMessage(BaseInferenceType):
+    role: str
+    content: Optional[str] = None
+    reasoning: Optional[str] = None
+    tool_call_id: Optional[str] = None
+    tool_calls: Optional[List[ChatCompletionOutputToolCall]] = None
+
+
+@dataclass_with_extra
+class ChatCompletionOutputComplete(BaseInferenceType):
+    finish_reason: str
+    index: int
+    message: ChatCompletionOutputMessage
+    logprobs: Optional[ChatCompletionOutputLogprobs] = None
+
+
+@dataclass_with_extra
+class ChatCompletionOutputUsage(BaseInferenceType):
+    completion_tokens: int
+    prompt_tokens: int
+    total_tokens: int
+
+
+@dataclass_with_extra
+class ChatCompletionOutput(BaseInferenceType):
+    """Chat Completion Output.
+    Auto-generated from TGI specs.
+    For more details, check out
+    https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-tgi-import.ts.
+    """
+
+    choices: List[ChatCompletionOutputComplete]
+    created: int
+    id: str
+    model: str
+    system_fingerprint: str
+    usage: ChatCompletionOutputUsage
+
+
+@dataclass_with_extra
+class ChatCompletionStreamOutputFunction(BaseInferenceType):
+    arguments: str
+    name: Optional[str] = None
+
+
+@dataclass_with_extra
+class ChatCompletionStreamOutputDeltaToolCall(BaseInferenceType):
+    function: ChatCompletionStreamOutputFunction
+    id: str
+    index: int
+    type: str
+
+
+@dataclass_with_extra
+class ChatCompletionStreamOutputDelta(BaseInferenceType):
+    role: str
+    content: Optional[str] = None
+    reasoning: Optional[str] = None
+    tool_call_id: Optional[str] = None
+    tool_calls: Optional[List[ChatCompletionStreamOutputDeltaToolCall]] = None
+
+
+@dataclass_with_extra
+class ChatCompletionStreamOutputTopLogprob(BaseInferenceType):
+    logprob: float
+    token: str
+
+
+@dataclass_with_extra
+class ChatCompletionStreamOutputLogprob(BaseInferenceType):
+    logprob: float
+    token: str
+    top_logprobs: List[ChatCompletionStreamOutputTopLogprob]
+
+
+@dataclass_with_extra
+class ChatCompletionStreamOutputLogprobs(BaseInferenceType):
+    content: List[ChatCompletionStreamOutputLogprob]
+
+
+@dataclass_with_extra
+class ChatCompletionStreamOutputChoice(BaseInferenceType):
+    delta: ChatCompletionStreamOutputDelta
+    index: int
+    finish_reason: Optional[str] = None
+    logprobs: Optional[ChatCompletionStreamOutputLogprobs] = None
+
+
+@dataclass_with_extra
+class ChatCompletionStreamOutputUsage(BaseInferenceType):
+    completion_tokens: int
+    prompt_tokens: int
+    total_tokens: int
+
+
+@dataclass_with_extra
+class ChatCompletionStreamOutput(BaseInferenceType):
+    """Chat Completion Stream Output.
+    Auto-generated from TGI specs.
+    For more details, check out
+    https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-tgi-import.ts.
+    """
+
+    choices: List[ChatCompletionStreamOutputChoice]
+    created: int
+    id: str
+    model: str
+    system_fingerprint: str
+    usage: Optional[ChatCompletionStreamOutputUsage] = None
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/depth_estimation.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/depth_estimation.py
new file mode 100644
index 0000000000000000000000000000000000000000..1e09bdffa194f97444e484de6e930f67ac030207
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/depth_estimation.py
@@ -0,0 +1,28 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Dict, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class DepthEstimationInput(BaseInferenceType):
+    """Inputs for Depth Estimation inference"""
+
+    inputs: Any
+    """The input image data"""
+    parameters: Optional[Dict[str, Any]] = None
+    """Additional inference parameters for Depth Estimation"""
+
+
+@dataclass_with_extra
+class DepthEstimationOutput(BaseInferenceType):
+    """Outputs of inference for the Depth Estimation task"""
+
+    depth: Any
+    """The predicted depth as an image"""
+    predicted_depth: Any
+    """The predicted depth as a tensor"""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/document_question_answering.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/document_question_answering.py
new file mode 100644
index 0000000000000000000000000000000000000000..2457d2c8c237f055f660e0e8291d846bb036949d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/document_question_answering.py
@@ -0,0 +1,80 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, List, Optional, Union
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class DocumentQuestionAnsweringInputData(BaseInferenceType):
+    """One (document, question) pair to answer"""
+
+    image: Any
+    """The image on which the question is asked"""
+    question: str
+    """A question to ask of the document"""
+
+
+@dataclass_with_extra
+class DocumentQuestionAnsweringParameters(BaseInferenceType):
+    """Additional inference parameters for Document Question Answering"""
+
+    doc_stride: Optional[int] = None
+    """If the words in the document are too long to fit with the question for the model, it will
+    be split in several chunks with some overlap. This argument controls the size of that
+    overlap.
+    """
+    handle_impossible_answer: Optional[bool] = None
+    """Whether to accept impossible as an answer"""
+    lang: Optional[str] = None
+    """Language to use while running OCR. Defaults to english."""
+    max_answer_len: Optional[int] = None
+    """The maximum length of predicted answers (e.g., only answers with a shorter length are
+    considered).
+    """
+    max_question_len: Optional[int] = None
+    """The maximum length of the question after tokenization. It will be truncated if needed."""
+    max_seq_len: Optional[int] = None
+    """The maximum length of the total sentence (context + question) in tokens of each chunk
+    passed to the model. The context will be split in several chunks (using doc_stride as
+    overlap) if needed.
+    """
+    top_k: Optional[int] = None
+    """The number of answers to return (will be chosen by order of likelihood). Can return less
+    than top_k answers if there are not enough options available within the context.
+    """
+    word_boxes: Optional[List[Union[List[float], str]]] = None
+    """A list of words and bounding boxes (normalized 0->1000). If provided, the inference will
+    skip the OCR step and use the provided bounding boxes instead.
+    """
+
+
+@dataclass_with_extra
+class DocumentQuestionAnsweringInput(BaseInferenceType):
+    """Inputs for Document Question Answering inference"""
+
+    inputs: DocumentQuestionAnsweringInputData
+    """One (document, question) pair to answer"""
+    parameters: Optional[DocumentQuestionAnsweringParameters] = None
+    """Additional inference parameters for Document Question Answering"""
+
+
+@dataclass_with_extra
+class DocumentQuestionAnsweringOutputElement(BaseInferenceType):
+    """Outputs of inference for the Document Question Answering task"""
+
+    answer: str
+    """The answer to the question."""
+    end: int
+    """The end word index of the answer (in the OCR’d version of the input or provided word
+    boxes).
+    """
+    score: float
+    """The probability associated to the answer."""
+    start: int
+    """The start word index of the answer (in the OCR’d version of the input or provided word
+    boxes).
+    """
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/feature_extraction.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/feature_extraction.py
new file mode 100644
index 0000000000000000000000000000000000000000..e965ddbac2af0a5bf73e662a7c18c847611d18a1
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/feature_extraction.py
@@ -0,0 +1,36 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import List, Literal, Optional, Union
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+FeatureExtractionInputTruncationDirection = Literal["Left", "Right"]
+
+
+@dataclass_with_extra
+class FeatureExtractionInput(BaseInferenceType):
+    """Feature Extraction Input.
+    Auto-generated from TEI specs.
+    For more details, check out
+    https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-tei-import.ts.
+    """
+
+    inputs: Union[List[str], str]
+    """The text or list of texts to embed."""
+    normalize: Optional[bool] = None
+    prompt_name: Optional[str] = None
+    """The name of the prompt that should be used by for encoding. If not set, no prompt
+    will be applied.
+    Must be a key in the `sentence-transformers` configuration `prompts` dictionary.
+    For example if ``prompt_name`` is "query" and the ``prompts`` is {"query": "query: ",
+    ...},
+    then the sentence "What is the capital of France?" will be encoded as
+    "query: What is the capital of France?" because the prompt text will be prepended before
+    any text to encode.
+    """
+    truncate: Optional[bool] = None
+    truncation_direction: Optional["FeatureExtractionInputTruncationDirection"] = None
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/fill_mask.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/fill_mask.py
new file mode 100644
index 0000000000000000000000000000000000000000..dfcdc56bc507e50280d38e0f63b024ada6a7ea94
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/fill_mask.py
@@ -0,0 +1,47 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, List, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class FillMaskParameters(BaseInferenceType):
+    """Additional inference parameters for Fill Mask"""
+
+    targets: Optional[List[str]] = None
+    """When passed, the model will limit the scores to the passed targets instead of looking up
+    in the whole vocabulary. If the provided targets are not in the model vocab, they will be
+    tokenized and the first resulting token will be used (with a warning, and that might be
+    slower).
+    """
+    top_k: Optional[int] = None
+    """When passed, overrides the number of predictions to return."""
+
+
+@dataclass_with_extra
+class FillMaskInput(BaseInferenceType):
+    """Inputs for Fill Mask inference"""
+
+    inputs: str
+    """The text with masked tokens"""
+    parameters: Optional[FillMaskParameters] = None
+    """Additional inference parameters for Fill Mask"""
+
+
+@dataclass_with_extra
+class FillMaskOutputElement(BaseInferenceType):
+    """Outputs of inference for the Fill Mask task"""
+
+    score: float
+    """The corresponding probability"""
+    sequence: str
+    """The corresponding input with the mask token prediction."""
+    token: int
+    """The predicted token id (to replace the masked one)."""
+    token_str: Any
+    fill_mask_output_token_str: Optional[str] = None
+    """The predicted token (to replace the masked one)."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_classification.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_classification.py
new file mode 100644
index 0000000000000000000000000000000000000000..0fdda6c83ff4c7aee5dc7794f0530e89d6b43047
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_classification.py
@@ -0,0 +1,43 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+ImageClassificationOutputTransform = Literal["sigmoid", "softmax", "none"]
+
+
+@dataclass_with_extra
+class ImageClassificationParameters(BaseInferenceType):
+    """Additional inference parameters for Image Classification"""
+
+    function_to_apply: Optional["ImageClassificationOutputTransform"] = None
+    """The function to apply to the model outputs in order to retrieve the scores."""
+    top_k: Optional[int] = None
+    """When specified, limits the output to the top K most probable classes."""
+
+
+@dataclass_with_extra
+class ImageClassificationInput(BaseInferenceType):
+    """Inputs for Image Classification inference"""
+
+    inputs: str
+    """The input image data as a base64-encoded string. If no `parameters` are provided, you can
+    also provide the image data as a raw bytes payload.
+    """
+    parameters: Optional[ImageClassificationParameters] = None
+    """Additional inference parameters for Image Classification"""
+
+
+@dataclass_with_extra
+class ImageClassificationOutputElement(BaseInferenceType):
+    """Outputs of inference for the Image Classification task"""
+
+    label: str
+    """The predicted class label."""
+    score: float
+    """The corresponding probability."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_segmentation.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_segmentation.py
new file mode 100644
index 0000000000000000000000000000000000000000..3dbf61db83ec2ae6ceafd901c4425567cd2e5b03
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_segmentation.py
@@ -0,0 +1,51 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+ImageSegmentationSubtask = Literal["instance", "panoptic", "semantic"]
+
+
+@dataclass_with_extra
+class ImageSegmentationParameters(BaseInferenceType):
+    """Additional inference parameters for Image Segmentation"""
+
+    mask_threshold: Optional[float] = None
+    """Threshold to use when turning the predicted masks into binary values."""
+    overlap_mask_area_threshold: Optional[float] = None
+    """Mask overlap threshold to eliminate small, disconnected segments."""
+    subtask: Optional["ImageSegmentationSubtask"] = None
+    """Segmentation task to be performed, depending on model capabilities."""
+    threshold: Optional[float] = None
+    """Probability threshold to filter out predicted masks."""
+
+
+@dataclass_with_extra
+class ImageSegmentationInput(BaseInferenceType):
+    """Inputs for Image Segmentation inference"""
+
+    inputs: str
+    """The input image data as a base64-encoded string. If no `parameters` are provided, you can
+    also provide the image data as a raw bytes payload.
+    """
+    parameters: Optional[ImageSegmentationParameters] = None
+    """Additional inference parameters for Image Segmentation"""
+
+
+@dataclass_with_extra
+class ImageSegmentationOutputElement(BaseInferenceType):
+    """Outputs of inference for the Image Segmentation task
+    A predicted mask / segment
+    """
+
+    label: str
+    """The label of the predicted segment."""
+    mask: str
+    """The corresponding mask as a black-and-white image (base64-encoded)."""
+    score: Optional[float] = None
+    """The score or confidence degree the model has."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_to_image.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_to_image.py
new file mode 100644
index 0000000000000000000000000000000000000000..b14c79fedf228bb66fa88327c6d2601e77b8d6c6
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_to_image.py
@@ -0,0 +1,60 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class ImageToImageTargetSize(BaseInferenceType):
+    """The size in pixels of the output image. This parameter is only supported by some
+    providers and for specific models. It will be ignored when unsupported.
+    """
+
+    height: int
+    width: int
+
+
+@dataclass_with_extra
+class ImageToImageParameters(BaseInferenceType):
+    """Additional inference parameters for Image To Image"""
+
+    guidance_scale: Optional[float] = None
+    """For diffusion models. A higher guidance scale value encourages the model to generate
+    images closely linked to the text prompt at the expense of lower image quality.
+    """
+    negative_prompt: Optional[str] = None
+    """One prompt to guide what NOT to include in image generation."""
+    num_inference_steps: Optional[int] = None
+    """For diffusion models. The number of denoising steps. More denoising steps usually lead to
+    a higher quality image at the expense of slower inference.
+    """
+    prompt: Optional[str] = None
+    """The text prompt to guide the image generation."""
+    target_size: Optional[ImageToImageTargetSize] = None
+    """The size in pixels of the output image. This parameter is only supported by some
+    providers and for specific models. It will be ignored when unsupported.
+    """
+
+
+@dataclass_with_extra
+class ImageToImageInput(BaseInferenceType):
+    """Inputs for Image To Image inference"""
+
+    inputs: str
+    """The input image data as a base64-encoded string. If no `parameters` are provided, you can
+    also provide the image data as a raw bytes payload.
+    """
+    parameters: Optional[ImageToImageParameters] = None
+    """Additional inference parameters for Image To Image"""
+
+
+@dataclass_with_extra
+class ImageToImageOutput(BaseInferenceType):
+    """Outputs of inference for the Image To Image task"""
+
+    image: Any
+    """The output image returned as raw bytes in the payload."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_to_text.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_to_text.py
new file mode 100644
index 0000000000000000000000000000000000000000..b65e0e0068e80dbcab5a4706fb5d49be2538c4ca
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_to_text.py
@@ -0,0 +1,100 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Literal, Optional, Union
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+ImageToTextEarlyStoppingEnum = Literal["never"]
+
+
+@dataclass_with_extra
+class ImageToTextGenerationParameters(BaseInferenceType):
+    """Parametrization of the text generation process"""
+
+    do_sample: Optional[bool] = None
+    """Whether to use sampling instead of greedy decoding when generating new tokens."""
+    early_stopping: Optional[Union[bool, "ImageToTextEarlyStoppingEnum"]] = None
+    """Controls the stopping condition for beam-based methods."""
+    epsilon_cutoff: Optional[float] = None
+    """If set to float strictly between 0 and 1, only tokens with a conditional probability
+    greater than epsilon_cutoff will be sampled. In the paper, suggested values range from
+    3e-4 to 9e-4, depending on the size of the model. See [Truncation Sampling as Language
+    Model Desmoothing](https://hf.co/papers/2210.15191) for more details.
+    """
+    eta_cutoff: Optional[float] = None
+    """Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to
+    float strictly between 0 and 1, a token is only considered if it is greater than either
+    eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter
+    term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In
+    the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model.
+    See [Truncation Sampling as Language Model Desmoothing](https://hf.co/papers/2210.15191)
+    for more details.
+    """
+    max_length: Optional[int] = None
+    """The maximum length (in tokens) of the generated text, including the input."""
+    max_new_tokens: Optional[int] = None
+    """The maximum number of tokens to generate. Takes precedence over max_length."""
+    min_length: Optional[int] = None
+    """The minimum length (in tokens) of the generated text, including the input."""
+    min_new_tokens: Optional[int] = None
+    """The minimum number of tokens to generate. Takes precedence over min_length."""
+    num_beam_groups: Optional[int] = None
+    """Number of groups to divide num_beams into in order to ensure diversity among different
+    groups of beams. See [this paper](https://hf.co/papers/1610.02424) for more details.
+    """
+    num_beams: Optional[int] = None
+    """Number of beams to use for beam search."""
+    penalty_alpha: Optional[float] = None
+    """The value balances the model confidence and the degeneration penalty in contrastive
+    search decoding.
+    """
+    temperature: Optional[float] = None
+    """The value used to modulate the next token probabilities."""
+    top_k: Optional[int] = None
+    """The number of highest probability vocabulary tokens to keep for top-k-filtering."""
+    top_p: Optional[float] = None
+    """If set to float < 1, only the smallest set of most probable tokens with probabilities
+    that add up to top_p or higher are kept for generation.
+    """
+    typical_p: Optional[float] = None
+    """Local typicality measures how similar the conditional probability of predicting a target
+    token next is to the expected conditional probability of predicting a random token next,
+    given the partial text already generated. If set to float < 1, the smallest set of the
+    most locally typical tokens with probabilities that add up to typical_p or higher are
+    kept for generation. See [this paper](https://hf.co/papers/2202.00666) for more details.
+    """
+    use_cache: Optional[bool] = None
+    """Whether the model should use the past last key/values attentions to speed up decoding"""
+
+
+@dataclass_with_extra
+class ImageToTextParameters(BaseInferenceType):
+    """Additional inference parameters for Image To Text"""
+
+    generation_parameters: Optional[ImageToTextGenerationParameters] = None
+    """Parametrization of the text generation process"""
+    max_new_tokens: Optional[int] = None
+    """The amount of maximum tokens to generate."""
+
+
+@dataclass_with_extra
+class ImageToTextInput(BaseInferenceType):
+    """Inputs for Image To Text inference"""
+
+    inputs: Any
+    """The input image data"""
+    parameters: Optional[ImageToTextParameters] = None
+    """Additional inference parameters for Image To Text"""
+
+
+@dataclass_with_extra
+class ImageToTextOutput(BaseInferenceType):
+    """Outputs of inference for the Image To Text task"""
+
+    generated_text: Any
+    image_to_text_output_generated_text: Optional[str] = None
+    """The generated text."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_to_video.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_to_video.py
new file mode 100644
index 0000000000000000000000000000000000000000..92192a2a05b7a825c6dd55e96702fece0f3b3316
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/image_to_video.py
@@ -0,0 +1,60 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class ImageToVideoTargetSize(BaseInferenceType):
+    """The size in pixel of the output video frames."""
+
+    height: int
+    width: int
+
+
+@dataclass_with_extra
+class ImageToVideoParameters(BaseInferenceType):
+    """Additional inference parameters for Image To Video"""
+
+    guidance_scale: Optional[float] = None
+    """For diffusion models. A higher guidance scale value encourages the model to generate
+    videos closely linked to the text prompt at the expense of lower image quality.
+    """
+    negative_prompt: Optional[str] = None
+    """One prompt to guide what NOT to include in video generation."""
+    num_frames: Optional[float] = None
+    """The num_frames parameter determines how many video frames are generated."""
+    num_inference_steps: Optional[int] = None
+    """The number of denoising steps. More denoising steps usually lead to a higher quality
+    video at the expense of slower inference.
+    """
+    prompt: Optional[str] = None
+    """The text prompt to guide the video generation."""
+    seed: Optional[int] = None
+    """Seed for the random number generator."""
+    target_size: Optional[ImageToVideoTargetSize] = None
+    """The size in pixel of the output video frames."""
+
+
+@dataclass_with_extra
+class ImageToVideoInput(BaseInferenceType):
+    """Inputs for Image To Video inference"""
+
+    inputs: str
+    """The input image data as a base64-encoded string. If no `parameters` are provided, you can
+    also provide the image data as a raw bytes payload.
+    """
+    parameters: Optional[ImageToVideoParameters] = None
+    """Additional inference parameters for Image To Video"""
+
+
+@dataclass_with_extra
+class ImageToVideoOutput(BaseInferenceType):
+    """Outputs of inference for the Image To Video task"""
+
+    video: Any
+    """The generated video returned as raw bytes in the payload."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/object_detection.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/object_detection.py
new file mode 100644
index 0000000000000000000000000000000000000000..75f3ebcfe1199462d0df60879b5ba6e517f7001e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/object_detection.py
@@ -0,0 +1,58 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class ObjectDetectionParameters(BaseInferenceType):
+    """Additional inference parameters for Object Detection"""
+
+    threshold: Optional[float] = None
+    """The probability necessary to make a prediction."""
+
+
+@dataclass_with_extra
+class ObjectDetectionInput(BaseInferenceType):
+    """Inputs for Object Detection inference"""
+
+    inputs: str
+    """The input image data as a base64-encoded string. If no `parameters` are provided, you can
+    also provide the image data as a raw bytes payload.
+    """
+    parameters: Optional[ObjectDetectionParameters] = None
+    """Additional inference parameters for Object Detection"""
+
+
+@dataclass_with_extra
+class ObjectDetectionBoundingBox(BaseInferenceType):
+    """The predicted bounding box. Coordinates are relative to the top left corner of the input
+    image.
+    """
+
+    xmax: int
+    """The x-coordinate of the bottom-right corner of the bounding box."""
+    xmin: int
+    """The x-coordinate of the top-left corner of the bounding box."""
+    ymax: int
+    """The y-coordinate of the bottom-right corner of the bounding box."""
+    ymin: int
+    """The y-coordinate of the top-left corner of the bounding box."""
+
+
+@dataclass_with_extra
+class ObjectDetectionOutputElement(BaseInferenceType):
+    """Outputs of inference for the Object Detection task"""
+
+    box: ObjectDetectionBoundingBox
+    """The predicted bounding box. Coordinates are relative to the top left corner of the input
+    image.
+    """
+    label: str
+    """The predicted label for the bounding box."""
+    score: float
+    """The associated score / probability."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/question_answering.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/question_answering.py
new file mode 100644
index 0000000000000000000000000000000000000000..014ab41893c560a2c266bc04a1d60bc933be31c7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/question_answering.py
@@ -0,0 +1,74 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class QuestionAnsweringInputData(BaseInferenceType):
+    """One (context, question) pair to answer"""
+
+    context: str
+    """The context to be used for answering the question"""
+    question: str
+    """The question to be answered"""
+
+
+@dataclass_with_extra
+class QuestionAnsweringParameters(BaseInferenceType):
+    """Additional inference parameters for Question Answering"""
+
+    align_to_words: Optional[bool] = None
+    """Attempts to align the answer to real words. Improves quality on space separated
+    languages. Might hurt on non-space-separated languages (like Japanese or Chinese)
+    """
+    doc_stride: Optional[int] = None
+    """If the context is too long to fit with the question for the model, it will be split in
+    several chunks with some overlap. This argument controls the size of that overlap.
+    """
+    handle_impossible_answer: Optional[bool] = None
+    """Whether to accept impossible as an answer."""
+    max_answer_len: Optional[int] = None
+    """The maximum length of predicted answers (e.g., only answers with a shorter length are
+    considered).
+    """
+    max_question_len: Optional[int] = None
+    """The maximum length of the question after tokenization. It will be truncated if needed."""
+    max_seq_len: Optional[int] = None
+    """The maximum length of the total sentence (context + question) in tokens of each chunk
+    passed to the model. The context will be split in several chunks (using docStride as
+    overlap) if needed.
+    """
+    top_k: Optional[int] = None
+    """The number of answers to return (will be chosen by order of likelihood). Note that we
+    return less than topk answers if there are not enough options available within the
+    context.
+    """
+
+
+@dataclass_with_extra
+class QuestionAnsweringInput(BaseInferenceType):
+    """Inputs for Question Answering inference"""
+
+    inputs: QuestionAnsweringInputData
+    """One (context, question) pair to answer"""
+    parameters: Optional[QuestionAnsweringParameters] = None
+    """Additional inference parameters for Question Answering"""
+
+
+@dataclass_with_extra
+class QuestionAnsweringOutputElement(BaseInferenceType):
+    """Outputs of inference for the Question Answering task"""
+
+    answer: str
+    """The answer to the question."""
+    end: int
+    """The character position in the input where the answer ends."""
+    score: float
+    """The probability associated to the answer."""
+    start: int
+    """The character position in the input where the answer begins."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/sentence_similarity.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/sentence_similarity.py
new file mode 100644
index 0000000000000000000000000000000000000000..66e8bb4d9322d4847556b7a17dc17bd208a37d0c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/sentence_similarity.py
@@ -0,0 +1,27 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Dict, List, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class SentenceSimilarityInputData(BaseInferenceType):
+    sentences: List[str]
+    """A list of strings which will be compared against the source_sentence."""
+    source_sentence: str
+    """The string that you wish to compare the other strings with. This can be a phrase,
+    sentence, or longer passage, depending on the model being used.
+    """
+
+
+@dataclass_with_extra
+class SentenceSimilarityInput(BaseInferenceType):
+    """Inputs for Sentence similarity inference"""
+
+    inputs: SentenceSimilarityInputData
+    parameters: Optional[Dict[str, Any]] = None
+    """Additional inference parameters for Sentence Similarity"""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/summarization.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/summarization.py
new file mode 100644
index 0000000000000000000000000000000000000000..33eae6fcba0e8724babf145f93be005868429c33
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/summarization.py
@@ -0,0 +1,41 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Dict, Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+SummarizationTruncationStrategy = Literal["do_not_truncate", "longest_first", "only_first", "only_second"]
+
+
+@dataclass_with_extra
+class SummarizationParameters(BaseInferenceType):
+    """Additional inference parameters for summarization."""
+
+    clean_up_tokenization_spaces: Optional[bool] = None
+    """Whether to clean up the potential extra spaces in the text output."""
+    generate_parameters: Optional[Dict[str, Any]] = None
+    """Additional parametrization of the text generation algorithm."""
+    truncation: Optional["SummarizationTruncationStrategy"] = None
+    """The truncation strategy to use."""
+
+
+@dataclass_with_extra
+class SummarizationInput(BaseInferenceType):
+    """Inputs for Summarization inference"""
+
+    inputs: str
+    """The input text to summarize."""
+    parameters: Optional[SummarizationParameters] = None
+    """Additional inference parameters for summarization."""
+
+
+@dataclass_with_extra
+class SummarizationOutput(BaseInferenceType):
+    """Outputs of inference for the Summarization task"""
+
+    summary_text: str
+    """The summarized text."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/table_question_answering.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/table_question_answering.py
new file mode 100644
index 0000000000000000000000000000000000000000..10e208eeeb50a689d2826a160432a2b005ec006c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/table_question_answering.py
@@ -0,0 +1,62 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Dict, List, Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class TableQuestionAnsweringInputData(BaseInferenceType):
+    """One (table, question) pair to answer"""
+
+    question: str
+    """The question to be answered about the table"""
+    table: Dict[str, List[str]]
+    """The table to serve as context for the questions"""
+
+
+Padding = Literal["do_not_pad", "longest", "max_length"]
+
+
+@dataclass_with_extra
+class TableQuestionAnsweringParameters(BaseInferenceType):
+    """Additional inference parameters for Table Question Answering"""
+
+    padding: Optional["Padding"] = None
+    """Activates and controls padding."""
+    sequential: Optional[bool] = None
+    """Whether to do inference sequentially or as a batch. Batching is faster, but models like
+    SQA require the inference to be done sequentially to extract relations within sequences,
+    given their conversational nature.
+    """
+    truncation: Optional[bool] = None
+    """Activates and controls truncation."""
+
+
+@dataclass_with_extra
+class TableQuestionAnsweringInput(BaseInferenceType):
+    """Inputs for Table Question Answering inference"""
+
+    inputs: TableQuestionAnsweringInputData
+    """One (table, question) pair to answer"""
+    parameters: Optional[TableQuestionAnsweringParameters] = None
+    """Additional inference parameters for Table Question Answering"""
+
+
+@dataclass_with_extra
+class TableQuestionAnsweringOutputElement(BaseInferenceType):
+    """Outputs of inference for the Table Question Answering task"""
+
+    answer: str
+    """The answer of the question given the table. If there is an aggregator, the answer will be
+    preceded by `AGGREGATOR >`.
+    """
+    cells: List[str]
+    """List of strings made up of the answer cell values."""
+    coordinates: List[List[int]]
+    """Coordinates of the cells of the answers."""
+    aggregator: Optional[str] = None
+    """If the model has an aggregator, this returns the aggregator."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text2text_generation.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text2text_generation.py
new file mode 100644
index 0000000000000000000000000000000000000000..34ac74e21e8a30d889f1a251f648d4c365325be6
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text2text_generation.py
@@ -0,0 +1,42 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Dict, Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+Text2TextGenerationTruncationStrategy = Literal["do_not_truncate", "longest_first", "only_first", "only_second"]
+
+
+@dataclass_with_extra
+class Text2TextGenerationParameters(BaseInferenceType):
+    """Additional inference parameters for Text2text Generation"""
+
+    clean_up_tokenization_spaces: Optional[bool] = None
+    """Whether to clean up the potential extra spaces in the text output."""
+    generate_parameters: Optional[Dict[str, Any]] = None
+    """Additional parametrization of the text generation algorithm"""
+    truncation: Optional["Text2TextGenerationTruncationStrategy"] = None
+    """The truncation strategy to use"""
+
+
+@dataclass_with_extra
+class Text2TextGenerationInput(BaseInferenceType):
+    """Inputs for Text2text Generation inference"""
+
+    inputs: str
+    """The input text data"""
+    parameters: Optional[Text2TextGenerationParameters] = None
+    """Additional inference parameters for Text2text Generation"""
+
+
+@dataclass_with_extra
+class Text2TextGenerationOutput(BaseInferenceType):
+    """Outputs of inference for the Text2text Generation task"""
+
+    generated_text: Any
+    text2_text_generation_output_generated_text: Optional[str] = None
+    """The generated text."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_classification.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_classification.py
new file mode 100644
index 0000000000000000000000000000000000000000..9a172b23f844fa58f757a644d52138a18e7b6ddb
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_classification.py
@@ -0,0 +1,41 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+TextClassificationOutputTransform = Literal["sigmoid", "softmax", "none"]
+
+
+@dataclass_with_extra
+class TextClassificationParameters(BaseInferenceType):
+    """Additional inference parameters for Text Classification"""
+
+    function_to_apply: Optional["TextClassificationOutputTransform"] = None
+    """The function to apply to the model outputs in order to retrieve the scores."""
+    top_k: Optional[int] = None
+    """When specified, limits the output to the top K most probable classes."""
+
+
+@dataclass_with_extra
+class TextClassificationInput(BaseInferenceType):
+    """Inputs for Text Classification inference"""
+
+    inputs: str
+    """The text to classify"""
+    parameters: Optional[TextClassificationParameters] = None
+    """Additional inference parameters for Text Classification"""
+
+
+@dataclass_with_extra
+class TextClassificationOutputElement(BaseInferenceType):
+    """Outputs of inference for the Text Classification task"""
+
+    label: str
+    """The predicted class label."""
+    score: float
+    """The corresponding probability."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_generation.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_generation.py
new file mode 100644
index 0000000000000000000000000000000000000000..9b79cc691dce3a6d42aef716d4a93a719f2d600c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_generation.py
@@ -0,0 +1,168 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, List, Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+TypeEnum = Literal["json", "regex", "json_schema"]
+
+
+@dataclass_with_extra
+class TextGenerationInputGrammarType(BaseInferenceType):
+    type: "TypeEnum"
+    value: Any
+    """A string that represents a [JSON Schema](https://json-schema.org/).
+    JSON Schema is a declarative language that allows to annotate JSON documents
+    with types and descriptions.
+    """
+
+
+@dataclass_with_extra
+class TextGenerationInputGenerateParameters(BaseInferenceType):
+    adapter_id: Optional[str] = None
+    """Lora adapter id"""
+    best_of: Optional[int] = None
+    """Generate best_of sequences and return the one if the highest token logprobs."""
+    decoder_input_details: Optional[bool] = None
+    """Whether to return decoder input token logprobs and ids."""
+    details: Optional[bool] = None
+    """Whether to return generation details."""
+    do_sample: Optional[bool] = None
+    """Activate logits sampling."""
+    frequency_penalty: Optional[float] = None
+    """The parameter for frequency penalty. 1.0 means no penalty
+    Penalize new tokens based on their existing frequency in the text so far,
+    decreasing the model's likelihood to repeat the same line verbatim.
+    """
+    grammar: Optional[TextGenerationInputGrammarType] = None
+    max_new_tokens: Optional[int] = None
+    """Maximum number of tokens to generate."""
+    repetition_penalty: Optional[float] = None
+    """The parameter for repetition penalty. 1.0 means no penalty.
+    See [this paper](https://arxiv.org/pdf/1909.05858.pdf) for more details.
+    """
+    return_full_text: Optional[bool] = None
+    """Whether to prepend the prompt to the generated text"""
+    seed: Optional[int] = None
+    """Random sampling seed."""
+    stop: Optional[List[str]] = None
+    """Stop generating tokens if a member of `stop` is generated."""
+    temperature: Optional[float] = None
+    """The value used to module the logits distribution."""
+    top_k: Optional[int] = None
+    """The number of highest probability vocabulary tokens to keep for top-k-filtering."""
+    top_n_tokens: Optional[int] = None
+    """The number of highest probability vocabulary tokens to keep for top-n-filtering."""
+    top_p: Optional[float] = None
+    """Top-p value for nucleus sampling."""
+    truncate: Optional[int] = None
+    """Truncate inputs tokens to the given size."""
+    typical_p: Optional[float] = None
+    """Typical Decoding mass
+    See [Typical Decoding for Natural Language Generation](https://arxiv.org/abs/2202.00666)
+    for more information.
+    """
+    watermark: Optional[bool] = None
+    """Watermarking with [A Watermark for Large Language
+    Models](https://arxiv.org/abs/2301.10226).
+    """
+
+
+@dataclass_with_extra
+class TextGenerationInput(BaseInferenceType):
+    """Text Generation Input.
+    Auto-generated from TGI specs.
+    For more details, check out
+    https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-tgi-import.ts.
+    """
+
+    inputs: str
+    parameters: Optional[TextGenerationInputGenerateParameters] = None
+    stream: Optional[bool] = None
+
+
+TextGenerationOutputFinishReason = Literal["length", "eos_token", "stop_sequence"]
+
+
+@dataclass_with_extra
+class TextGenerationOutputPrefillToken(BaseInferenceType):
+    id: int
+    logprob: float
+    text: str
+
+
+@dataclass_with_extra
+class TextGenerationOutputToken(BaseInferenceType):
+    id: int
+    logprob: float
+    special: bool
+    text: str
+
+
+@dataclass_with_extra
+class TextGenerationOutputBestOfSequence(BaseInferenceType):
+    finish_reason: "TextGenerationOutputFinishReason"
+    generated_text: str
+    generated_tokens: int
+    prefill: List[TextGenerationOutputPrefillToken]
+    tokens: List[TextGenerationOutputToken]
+    seed: Optional[int] = None
+    top_tokens: Optional[List[List[TextGenerationOutputToken]]] = None
+
+
+@dataclass_with_extra
+class TextGenerationOutputDetails(BaseInferenceType):
+    finish_reason: "TextGenerationOutputFinishReason"
+    generated_tokens: int
+    prefill: List[TextGenerationOutputPrefillToken]
+    tokens: List[TextGenerationOutputToken]
+    best_of_sequences: Optional[List[TextGenerationOutputBestOfSequence]] = None
+    seed: Optional[int] = None
+    top_tokens: Optional[List[List[TextGenerationOutputToken]]] = None
+
+
+@dataclass_with_extra
+class TextGenerationOutput(BaseInferenceType):
+    """Text Generation Output.
+    Auto-generated from TGI specs.
+    For more details, check out
+    https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-tgi-import.ts.
+    """
+
+    generated_text: str
+    details: Optional[TextGenerationOutputDetails] = None
+
+
+@dataclass_with_extra
+class TextGenerationStreamOutputStreamDetails(BaseInferenceType):
+    finish_reason: "TextGenerationOutputFinishReason"
+    generated_tokens: int
+    input_length: int
+    seed: Optional[int] = None
+
+
+@dataclass_with_extra
+class TextGenerationStreamOutputToken(BaseInferenceType):
+    id: int
+    logprob: float
+    special: bool
+    text: str
+
+
+@dataclass_with_extra
+class TextGenerationStreamOutput(BaseInferenceType):
+    """Text Generation Stream Output.
+    Auto-generated from TGI specs.
+    For more details, check out
+    https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-tgi-import.ts.
+    """
+
+    index: int
+    token: TextGenerationStreamOutputToken
+    details: Optional[TextGenerationStreamOutputStreamDetails] = None
+    generated_text: Optional[str] = None
+    top_tokens: Optional[List[TextGenerationStreamOutputToken]] = None
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_audio.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_audio.py
new file mode 100644
index 0000000000000000000000000000000000000000..87af80a598af70800b8386f034c65de0b397479e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_audio.py
@@ -0,0 +1,99 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Literal, Optional, Union
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+TextToAudioEarlyStoppingEnum = Literal["never"]
+
+
+@dataclass_with_extra
+class TextToAudioGenerationParameters(BaseInferenceType):
+    """Parametrization of the text generation process"""
+
+    do_sample: Optional[bool] = None
+    """Whether to use sampling instead of greedy decoding when generating new tokens."""
+    early_stopping: Optional[Union[bool, "TextToAudioEarlyStoppingEnum"]] = None
+    """Controls the stopping condition for beam-based methods."""
+    epsilon_cutoff: Optional[float] = None
+    """If set to float strictly between 0 and 1, only tokens with a conditional probability
+    greater than epsilon_cutoff will be sampled. In the paper, suggested values range from
+    3e-4 to 9e-4, depending on the size of the model. See [Truncation Sampling as Language
+    Model Desmoothing](https://hf.co/papers/2210.15191) for more details.
+    """
+    eta_cutoff: Optional[float] = None
+    """Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to
+    float strictly between 0 and 1, a token is only considered if it is greater than either
+    eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter
+    term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In
+    the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model.
+    See [Truncation Sampling as Language Model Desmoothing](https://hf.co/papers/2210.15191)
+    for more details.
+    """
+    max_length: Optional[int] = None
+    """The maximum length (in tokens) of the generated text, including the input."""
+    max_new_tokens: Optional[int] = None
+    """The maximum number of tokens to generate. Takes precedence over max_length."""
+    min_length: Optional[int] = None
+    """The minimum length (in tokens) of the generated text, including the input."""
+    min_new_tokens: Optional[int] = None
+    """The minimum number of tokens to generate. Takes precedence over min_length."""
+    num_beam_groups: Optional[int] = None
+    """Number of groups to divide num_beams into in order to ensure diversity among different
+    groups of beams. See [this paper](https://hf.co/papers/1610.02424) for more details.
+    """
+    num_beams: Optional[int] = None
+    """Number of beams to use for beam search."""
+    penalty_alpha: Optional[float] = None
+    """The value balances the model confidence and the degeneration penalty in contrastive
+    search decoding.
+    """
+    temperature: Optional[float] = None
+    """The value used to modulate the next token probabilities."""
+    top_k: Optional[int] = None
+    """The number of highest probability vocabulary tokens to keep for top-k-filtering."""
+    top_p: Optional[float] = None
+    """If set to float < 1, only the smallest set of most probable tokens with probabilities
+    that add up to top_p or higher are kept for generation.
+    """
+    typical_p: Optional[float] = None
+    """Local typicality measures how similar the conditional probability of predicting a target
+    token next is to the expected conditional probability of predicting a random token next,
+    given the partial text already generated. If set to float < 1, the smallest set of the
+    most locally typical tokens with probabilities that add up to typical_p or higher are
+    kept for generation. See [this paper](https://hf.co/papers/2202.00666) for more details.
+    """
+    use_cache: Optional[bool] = None
+    """Whether the model should use the past last key/values attentions to speed up decoding"""
+
+
+@dataclass_with_extra
+class TextToAudioParameters(BaseInferenceType):
+    """Additional inference parameters for Text To Audio"""
+
+    generation_parameters: Optional[TextToAudioGenerationParameters] = None
+    """Parametrization of the text generation process"""
+
+
+@dataclass_with_extra
+class TextToAudioInput(BaseInferenceType):
+    """Inputs for Text To Audio inference"""
+
+    inputs: str
+    """The input text data"""
+    parameters: Optional[TextToAudioParameters] = None
+    """Additional inference parameters for Text To Audio"""
+
+
+@dataclass_with_extra
+class TextToAudioOutput(BaseInferenceType):
+    """Outputs of inference for the Text To Audio task"""
+
+    audio: Any
+    """The generated audio waveform."""
+    sampling_rate: float
+    """The sampling rate of the generated audio waveform."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_image.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_image.py
new file mode 100644
index 0000000000000000000000000000000000000000..20c963731371339975019ca5d40c95303d79209b
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_image.py
@@ -0,0 +1,50 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class TextToImageParameters(BaseInferenceType):
+    """Additional inference parameters for Text To Image"""
+
+    guidance_scale: Optional[float] = None
+    """A higher guidance scale value encourages the model to generate images closely linked to
+    the text prompt, but values too high may cause saturation and other artifacts.
+    """
+    height: Optional[int] = None
+    """The height in pixels of the output image"""
+    negative_prompt: Optional[str] = None
+    """One prompt to guide what NOT to include in image generation."""
+    num_inference_steps: Optional[int] = None
+    """The number of denoising steps. More denoising steps usually lead to a higher quality
+    image at the expense of slower inference.
+    """
+    scheduler: Optional[str] = None
+    """Override the scheduler with a compatible one."""
+    seed: Optional[int] = None
+    """Seed for the random number generator."""
+    width: Optional[int] = None
+    """The width in pixels of the output image"""
+
+
+@dataclass_with_extra
+class TextToImageInput(BaseInferenceType):
+    """Inputs for Text To Image inference"""
+
+    inputs: str
+    """The input text data (sometimes called "prompt")"""
+    parameters: Optional[TextToImageParameters] = None
+    """Additional inference parameters for Text To Image"""
+
+
+@dataclass_with_extra
+class TextToImageOutput(BaseInferenceType):
+    """Outputs of inference for the Text To Image task"""
+
+    image: Any
+    """The generated image returned as raw bytes in the payload."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_speech.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_speech.py
new file mode 100644
index 0000000000000000000000000000000000000000..ce2db8f3f901cc99b5d2fcbb362c4b07b2a718e0
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_speech.py
@@ -0,0 +1,99 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Literal, Optional, Union
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+TextToSpeechEarlyStoppingEnum = Literal["never"]
+
+
+@dataclass_with_extra
+class TextToSpeechGenerationParameters(BaseInferenceType):
+    """Parametrization of the text generation process"""
+
+    do_sample: Optional[bool] = None
+    """Whether to use sampling instead of greedy decoding when generating new tokens."""
+    early_stopping: Optional[Union[bool, "TextToSpeechEarlyStoppingEnum"]] = None
+    """Controls the stopping condition for beam-based methods."""
+    epsilon_cutoff: Optional[float] = None
+    """If set to float strictly between 0 and 1, only tokens with a conditional probability
+    greater than epsilon_cutoff will be sampled. In the paper, suggested values range from
+    3e-4 to 9e-4, depending on the size of the model. See [Truncation Sampling as Language
+    Model Desmoothing](https://hf.co/papers/2210.15191) for more details.
+    """
+    eta_cutoff: Optional[float] = None
+    """Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to
+    float strictly between 0 and 1, a token is only considered if it is greater than either
+    eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter
+    term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In
+    the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model.
+    See [Truncation Sampling as Language Model Desmoothing](https://hf.co/papers/2210.15191)
+    for more details.
+    """
+    max_length: Optional[int] = None
+    """The maximum length (in tokens) of the generated text, including the input."""
+    max_new_tokens: Optional[int] = None
+    """The maximum number of tokens to generate. Takes precedence over max_length."""
+    min_length: Optional[int] = None
+    """The minimum length (in tokens) of the generated text, including the input."""
+    min_new_tokens: Optional[int] = None
+    """The minimum number of tokens to generate. Takes precedence over min_length."""
+    num_beam_groups: Optional[int] = None
+    """Number of groups to divide num_beams into in order to ensure diversity among different
+    groups of beams. See [this paper](https://hf.co/papers/1610.02424) for more details.
+    """
+    num_beams: Optional[int] = None
+    """Number of beams to use for beam search."""
+    penalty_alpha: Optional[float] = None
+    """The value balances the model confidence and the degeneration penalty in contrastive
+    search decoding.
+    """
+    temperature: Optional[float] = None
+    """The value used to modulate the next token probabilities."""
+    top_k: Optional[int] = None
+    """The number of highest probability vocabulary tokens to keep for top-k-filtering."""
+    top_p: Optional[float] = None
+    """If set to float < 1, only the smallest set of most probable tokens with probabilities
+    that add up to top_p or higher are kept for generation.
+    """
+    typical_p: Optional[float] = None
+    """Local typicality measures how similar the conditional probability of predicting a target
+    token next is to the expected conditional probability of predicting a random token next,
+    given the partial text already generated. If set to float < 1, the smallest set of the
+    most locally typical tokens with probabilities that add up to typical_p or higher are
+    kept for generation. See [this paper](https://hf.co/papers/2202.00666) for more details.
+    """
+    use_cache: Optional[bool] = None
+    """Whether the model should use the past last key/values attentions to speed up decoding"""
+
+
+@dataclass_with_extra
+class TextToSpeechParameters(BaseInferenceType):
+    """Additional inference parameters for Text To Speech"""
+
+    generation_parameters: Optional[TextToSpeechGenerationParameters] = None
+    """Parametrization of the text generation process"""
+
+
+@dataclass_with_extra
+class TextToSpeechInput(BaseInferenceType):
+    """Inputs for Text To Speech inference"""
+
+    inputs: str
+    """The input text data"""
+    parameters: Optional[TextToSpeechParameters] = None
+    """Additional inference parameters for Text To Speech"""
+
+
+@dataclass_with_extra
+class TextToSpeechOutput(BaseInferenceType):
+    """Outputs of inference for the Text To Speech task"""
+
+    audio: Any
+    """The generated audio"""
+    sampling_rate: Optional[float] = None
+    """The sampling rate of the generated audio waveform."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_video.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_video.py
new file mode 100644
index 0000000000000000000000000000000000000000..e54a1bc094e4aaf7132e502aa268bc052ab34f0a
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/text_to_video.py
@@ -0,0 +1,46 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, List, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class TextToVideoParameters(BaseInferenceType):
+    """Additional inference parameters for Text To Video"""
+
+    guidance_scale: Optional[float] = None
+    """A higher guidance scale value encourages the model to generate videos closely linked to
+    the text prompt, but values too high may cause saturation and other artifacts.
+    """
+    negative_prompt: Optional[List[str]] = None
+    """One or several prompt to guide what NOT to include in video generation."""
+    num_frames: Optional[float] = None
+    """The num_frames parameter determines how many video frames are generated."""
+    num_inference_steps: Optional[int] = None
+    """The number of denoising steps. More denoising steps usually lead to a higher quality
+    video at the expense of slower inference.
+    """
+    seed: Optional[int] = None
+    """Seed for the random number generator."""
+
+
+@dataclass_with_extra
+class TextToVideoInput(BaseInferenceType):
+    """Inputs for Text To Video inference"""
+
+    inputs: str
+    """The input text data (sometimes called "prompt")"""
+    parameters: Optional[TextToVideoParameters] = None
+    """Additional inference parameters for Text To Video"""
+
+
+@dataclass_with_extra
+class TextToVideoOutput(BaseInferenceType):
+    """Outputs of inference for the Text To Video task"""
+
+    video: Any
+    """The generated video returned as raw bytes in the payload."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/token_classification.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/token_classification.py
new file mode 100644
index 0000000000000000000000000000000000000000..e039b6a1db7dcd54dbc9434d3254da0770c6799e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/token_classification.py
@@ -0,0 +1,51 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import List, Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+TokenClassificationAggregationStrategy = Literal["none", "simple", "first", "average", "max"]
+
+
+@dataclass_with_extra
+class TokenClassificationParameters(BaseInferenceType):
+    """Additional inference parameters for Token Classification"""
+
+    aggregation_strategy: Optional["TokenClassificationAggregationStrategy"] = None
+    """The strategy used to fuse tokens based on model predictions"""
+    ignore_labels: Optional[List[str]] = None
+    """A list of labels to ignore"""
+    stride: Optional[int] = None
+    """The number of overlapping tokens between chunks when splitting the input text."""
+
+
+@dataclass_with_extra
+class TokenClassificationInput(BaseInferenceType):
+    """Inputs for Token Classification inference"""
+
+    inputs: str
+    """The input text data"""
+    parameters: Optional[TokenClassificationParameters] = None
+    """Additional inference parameters for Token Classification"""
+
+
+@dataclass_with_extra
+class TokenClassificationOutputElement(BaseInferenceType):
+    """Outputs of inference for the Token Classification task"""
+
+    end: int
+    """The character position in the input where this group ends."""
+    score: float
+    """The associated score / probability"""
+    start: int
+    """The character position in the input where this group begins."""
+    word: str
+    """The corresponding text"""
+    entity: Optional[str] = None
+    """The predicted label for a single token"""
+    entity_group: Optional[str] = None
+    """The predicted label for a group of one or more tokens"""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/translation.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/translation.py
new file mode 100644
index 0000000000000000000000000000000000000000..df95b7dbb1f4ce5b80cec034e004bb6e71387be8
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/translation.py
@@ -0,0 +1,49 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Dict, Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+TranslationTruncationStrategy = Literal["do_not_truncate", "longest_first", "only_first", "only_second"]
+
+
+@dataclass_with_extra
+class TranslationParameters(BaseInferenceType):
+    """Additional inference parameters for Translation"""
+
+    clean_up_tokenization_spaces: Optional[bool] = None
+    """Whether to clean up the potential extra spaces in the text output."""
+    generate_parameters: Optional[Dict[str, Any]] = None
+    """Additional parametrization of the text generation algorithm."""
+    src_lang: Optional[str] = None
+    """The source language of the text. Required for models that can translate from multiple
+    languages.
+    """
+    tgt_lang: Optional[str] = None
+    """Target language to translate to. Required for models that can translate to multiple
+    languages.
+    """
+    truncation: Optional["TranslationTruncationStrategy"] = None
+    """The truncation strategy to use."""
+
+
+@dataclass_with_extra
+class TranslationInput(BaseInferenceType):
+    """Inputs for Translation inference"""
+
+    inputs: str
+    """The text to translate."""
+    parameters: Optional[TranslationParameters] = None
+    """Additional inference parameters for Translation"""
+
+
+@dataclass_with_extra
+class TranslationOutput(BaseInferenceType):
+    """Outputs of inference for the Translation task"""
+
+    translation_text: str
+    """The translated text."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/video_classification.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/video_classification.py
new file mode 100644
index 0000000000000000000000000000000000000000..e1d7a15bb4ee5fa63aa6ebc3750191bd38549212
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/video_classification.py
@@ -0,0 +1,45 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Literal, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+VideoClassificationOutputTransform = Literal["sigmoid", "softmax", "none"]
+
+
+@dataclass_with_extra
+class VideoClassificationParameters(BaseInferenceType):
+    """Additional inference parameters for Video Classification"""
+
+    frame_sampling_rate: Optional[int] = None
+    """The sampling rate used to select frames from the video."""
+    function_to_apply: Optional["VideoClassificationOutputTransform"] = None
+    """The function to apply to the model outputs in order to retrieve the scores."""
+    num_frames: Optional[int] = None
+    """The number of sampled frames to consider for classification."""
+    top_k: Optional[int] = None
+    """When specified, limits the output to the top K most probable classes."""
+
+
+@dataclass_with_extra
+class VideoClassificationInput(BaseInferenceType):
+    """Inputs for Video Classification inference"""
+
+    inputs: Any
+    """The input video data"""
+    parameters: Optional[VideoClassificationParameters] = None
+    """Additional inference parameters for Video Classification"""
+
+
+@dataclass_with_extra
+class VideoClassificationOutputElement(BaseInferenceType):
+    """Outputs of inference for the Video Classification task"""
+
+    label: str
+    """The predicted class label."""
+    score: float
+    """The corresponding probability."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/visual_question_answering.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/visual_question_answering.py
new file mode 100644
index 0000000000000000000000000000000000000000..d368f1621289bc11a17be3e590cf8a040019d455
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/visual_question_answering.py
@@ -0,0 +1,49 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import Any, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class VisualQuestionAnsweringInputData(BaseInferenceType):
+    """One (image, question) pair to answer"""
+
+    image: Any
+    """The image."""
+    question: str
+    """The question to answer based on the image."""
+
+
+@dataclass_with_extra
+class VisualQuestionAnsweringParameters(BaseInferenceType):
+    """Additional inference parameters for Visual Question Answering"""
+
+    top_k: Optional[int] = None
+    """The number of answers to return (will be chosen by order of likelihood). Note that we
+    return less than topk answers if there are not enough options available within the
+    context.
+    """
+
+
+@dataclass_with_extra
+class VisualQuestionAnsweringInput(BaseInferenceType):
+    """Inputs for Visual Question Answering inference"""
+
+    inputs: VisualQuestionAnsweringInputData
+    """One (image, question) pair to answer"""
+    parameters: Optional[VisualQuestionAnsweringParameters] = None
+    """Additional inference parameters for Visual Question Answering"""
+
+
+@dataclass_with_extra
+class VisualQuestionAnsweringOutputElement(BaseInferenceType):
+    """Outputs of inference for the Visual Question Answering task"""
+
+    score: float
+    """The associated score / probability"""
+    answer: Optional[str] = None
+    """The answer to the question"""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/zero_shot_classification.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/zero_shot_classification.py
new file mode 100644
index 0000000000000000000000000000000000000000..47b32492e358edcc0de6aa09d53635b0a8156b25
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/zero_shot_classification.py
@@ -0,0 +1,45 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import List, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class ZeroShotClassificationParameters(BaseInferenceType):
+    """Additional inference parameters for Zero Shot Classification"""
+
+    candidate_labels: List[str]
+    """The set of possible class labels to classify the text into."""
+    hypothesis_template: Optional[str] = None
+    """The sentence used in conjunction with `candidate_labels` to attempt the text
+    classification by replacing the placeholder with the candidate labels.
+    """
+    multi_label: Optional[bool] = None
+    """Whether multiple candidate labels can be true. If false, the scores are normalized such
+    that the sum of the label likelihoods for each sequence is 1. If true, the labels are
+    considered independent and probabilities are normalized for each candidate.
+    """
+
+
+@dataclass_with_extra
+class ZeroShotClassificationInput(BaseInferenceType):
+    """Inputs for Zero Shot Classification inference"""
+
+    inputs: str
+    """The text to classify"""
+    parameters: ZeroShotClassificationParameters
+    """Additional inference parameters for Zero Shot Classification"""
+
+
+@dataclass_with_extra
+class ZeroShotClassificationOutputElement(BaseInferenceType):
+    """Outputs of inference for the Zero Shot Classification task"""
+
+    label: str
+    """The predicted class label."""
+    score: float
+    """The corresponding probability."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/zero_shot_image_classification.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/zero_shot_image_classification.py
new file mode 100644
index 0000000000000000000000000000000000000000..998d66b6b4e3356f0f09a0ad25ebdaf2e76cd03f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/zero_shot_image_classification.py
@@ -0,0 +1,40 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import List, Optional
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class ZeroShotImageClassificationParameters(BaseInferenceType):
+    """Additional inference parameters for Zero Shot Image Classification"""
+
+    candidate_labels: List[str]
+    """The candidate labels for this image"""
+    hypothesis_template: Optional[str] = None
+    """The sentence used in conjunction with `candidate_labels` to attempt the image
+    classification by replacing the placeholder with the candidate labels.
+    """
+
+
+@dataclass_with_extra
+class ZeroShotImageClassificationInput(BaseInferenceType):
+    """Inputs for Zero Shot Image Classification inference"""
+
+    inputs: str
+    """The input image data to classify as a base64-encoded string."""
+    parameters: ZeroShotImageClassificationParameters
+    """Additional inference parameters for Zero Shot Image Classification"""
+
+
+@dataclass_with_extra
+class ZeroShotImageClassificationOutputElement(BaseInferenceType):
+    """Outputs of inference for the Zero Shot Image Classification task"""
+
+    label: str
+    """The predicted class label."""
+    score: float
+    """The corresponding probability."""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/zero_shot_object_detection.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/zero_shot_object_detection.py
new file mode 100644
index 0000000000000000000000000000000000000000..8ef76b5fcb93e8126266e4b1464934d01024b1b7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_generated/types/zero_shot_object_detection.py
@@ -0,0 +1,52 @@
+# Inference code generated from the JSON schema spec in @huggingface/tasks.
+#
+# See:
+#   - script: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/scripts/inference-codegen.ts
+#   - specs:  https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks.
+from typing import List
+
+from .base import BaseInferenceType, dataclass_with_extra
+
+
+@dataclass_with_extra
+class ZeroShotObjectDetectionParameters(BaseInferenceType):
+    """Additional inference parameters for Zero Shot Object Detection"""
+
+    candidate_labels: List[str]
+    """The candidate labels for this image"""
+
+
+@dataclass_with_extra
+class ZeroShotObjectDetectionInput(BaseInferenceType):
+    """Inputs for Zero Shot Object Detection inference"""
+
+    inputs: str
+    """The input image data as a base64-encoded string."""
+    parameters: ZeroShotObjectDetectionParameters
+    """Additional inference parameters for Zero Shot Object Detection"""
+
+
+@dataclass_with_extra
+class ZeroShotObjectDetectionBoundingBox(BaseInferenceType):
+    """The predicted bounding box. Coordinates are relative to the top left corner of the input
+    image.
+    """
+
+    xmax: int
+    xmin: int
+    ymax: int
+    ymin: int
+
+
+@dataclass_with_extra
+class ZeroShotObjectDetectionOutputElement(BaseInferenceType):
+    """Outputs of inference for the Zero Shot Object Detection task"""
+
+    box: ZeroShotObjectDetectionBoundingBox
+    """The predicted bounding box. Coordinates are relative to the top left corner of the input
+    image.
+    """
+    label: str
+    """A candidate label"""
+    score: float
+    """The associated score / probability"""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..bfe47723749fd39c659fb45e01824e3be1f4ca19
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/_cli_hacks.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/_cli_hacks.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e5b70676ec3437923a6939cac64320698367dd06
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/_cli_hacks.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/agent.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/agent.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a7231a75a23d60b7f719a763cc940ba66cdc78c0
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/agent.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/cli.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/cli.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5b4c3bde988774e3376f629f821450c32f193c97
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/cli.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/constants.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/constants.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b9daf4515ba91470664b3bfa01ba76e205b920b3
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/constants.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/mcp_client.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/mcp_client.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a337d6fab140bce92394227bf86727ac8429972d
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/mcp_client.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/types.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/types.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f05a1920f8b5ad1a47a24d1c1e494ffd627597ba
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/types.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b9d43217f111db3ef2e4e24bc6408af7d1536295
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/__pycache__/utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/_cli_hacks.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/_cli_hacks.py
new file mode 100644
index 0000000000000000000000000000000000000000..64251bbb745dc3b4b561f0eb249be65108b20d82
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/_cli_hacks.py
@@ -0,0 +1,88 @@
+import asyncio
+import sys
+from functools import partial
+
+import typer
+
+
+def _patch_anyio_open_process():
+    """
+    Patch anyio.open_process to allow detached processes on Windows and Unix-like systems.
+
+    This is necessary to prevent the MCP client from being interrupted by Ctrl+C when running in the CLI.
+    """
+    import subprocess
+
+    import anyio
+
+    if getattr(anyio, "_tiny_agents_patched", False):
+        return
+    anyio._tiny_agents_patched = True  # ty: ignore[invalid-assignment]
+
+    original_open_process = anyio.open_process
+
+    if sys.platform == "win32":
+        # On Windows, we need to set the creation flags to create a new process group
+
+        async def open_process_in_new_group(*args, **kwargs):
+            """
+            Wrapper for open_process to handle Windows-specific process creation flags.
+            """
+            # Ensure we pass the creation flags for Windows
+            kwargs.setdefault("creationflags", subprocess.CREATE_NEW_PROCESS_GROUP)
+            return await original_open_process(*args, **kwargs)
+
+        anyio.open_process = open_process_in_new_group  # ty: ignore[invalid-assignment]
+    else:
+        # For Unix-like systems, we can use setsid to create a new session
+        async def open_process_in_new_group(*args, **kwargs):
+            """
+            Wrapper for open_process to handle Unix-like systems with start_new_session=True.
+            """
+            kwargs.setdefault("start_new_session", True)
+            return await original_open_process(*args, **kwargs)
+
+        anyio.open_process = open_process_in_new_group  # ty: ignore[invalid-assignment]
+
+
+async def _async_prompt(exit_event: asyncio.Event, prompt: str = "» ") -> str:
+    """
+    Asynchronous prompt function that reads input from stdin without blocking.
+
+    This function is designed to work in an asynchronous context, allowing the event loop to gracefully stop it (e.g. on Ctrl+C).
+
+    Alternatively, we could use https://github.com/vxgmichel/aioconsole but that would be an additional dependency.
+    """
+    loop = asyncio.get_event_loop()
+
+    if sys.platform == "win32":
+        # Windows: Use run_in_executor to avoid blocking the event loop
+        # Degraded solution: this is not ideal as user will have to CTRL+C once more to stop the prompt (and it'll not be graceful)
+        return await loop.run_in_executor(None, partial(typer.prompt, prompt, prompt_suffix=" "))
+    else:
+        # UNIX-like: Use loop.add_reader for non-blocking stdin read
+        future = loop.create_future()
+
+        def on_input():
+            line = sys.stdin.readline()
+            loop.remove_reader(sys.stdin)
+            future.set_result(line)
+
+        print(prompt, end=" ", flush=True)
+        loop.add_reader(sys.stdin, on_input)  # not supported on Windows
+
+        # Wait for user input or exit event
+        # Wait until either the user hits enter or exit_event is set
+        exit_task = asyncio.create_task(exit_event.wait())
+        await asyncio.wait(
+            [future, exit_task],
+            return_when=asyncio.FIRST_COMPLETED,
+        )
+
+        # Check which one has been triggered
+        if exit_event.is_set():
+            future.cancel()
+            return ""
+
+        line = await future
+        return line.strip()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/agent.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/agent.py
new file mode 100644
index 0000000000000000000000000000000000000000..b9eb347ed60a7178caecc8d54d4b6b2593d80884
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/agent.py
@@ -0,0 +1,100 @@
+from __future__ import annotations
+
+import asyncio
+from typing import AsyncGenerator, Dict, Iterable, List, Optional, Union
+
+from huggingface_hub import ChatCompletionInputMessage, ChatCompletionStreamOutput, MCPClient
+
+from .._providers import PROVIDER_OR_POLICY_T
+from .constants import DEFAULT_SYSTEM_PROMPT, EXIT_LOOP_TOOLS, MAX_NUM_TURNS
+from .types import ServerConfig
+
+
+class Agent(MCPClient):
+    """
+    Implementation of a Simple Agent, which is a simple while loop built right on top of an [`MCPClient`].
+
+    > [!WARNING]
+    > This class is experimental and might be subject to breaking changes in the future without prior notice.
+
+    Args:
+        model (`str`, *optional*):
+            The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. `meta-llama/Meta-Llama-3-8B-Instruct`
+            or a URL to a deployed Inference Endpoint or other local or remote endpoint.
+        servers (`Iterable[Dict]`):
+            MCP servers to connect to. Each server is a dictionary containing a `type` key and a `config` key. The `type` key can be `"stdio"` or `"sse"`, and the `config` key is a dictionary of arguments for the server.
+        provider (`str`, *optional*):
+            Name of the provider to use for inference. Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
+            If model is a URL or `base_url` is passed, then `provider` is not used.
+        base_url (`str`, *optional*):
+            The base URL to run inference. Defaults to None.
+        api_key (`str`, *optional*):
+            Token to use for authentication. Will default to the locally Hugging Face saved token if not provided. You can also use your own provider API key to interact directly with the provider's service.
+        prompt (`str`, *optional*):
+            The system prompt to use for the agent. Defaults to the default system prompt in `constants.py`.
+    """
+
+    def __init__(
+        self,
+        *,
+        model: Optional[str] = None,
+        servers: Iterable[ServerConfig],
+        provider: Optional[PROVIDER_OR_POLICY_T] = None,
+        base_url: Optional[str] = None,
+        api_key: Optional[str] = None,
+        prompt: Optional[str] = None,
+    ):
+        super().__init__(model=model, provider=provider, base_url=base_url, api_key=api_key)
+        self._servers_cfg = list(servers)
+        self.messages: List[Union[Dict, ChatCompletionInputMessage]] = [
+            {"role": "system", "content": prompt or DEFAULT_SYSTEM_PROMPT}
+        ]
+
+    async def load_tools(self) -> None:
+        for cfg in self._servers_cfg:
+            await self.add_mcp_server(**cfg)
+
+    async def run(
+        self,
+        user_input: str,
+        *,
+        abort_event: Optional[asyncio.Event] = None,
+    ) -> AsyncGenerator[Union[ChatCompletionStreamOutput, ChatCompletionInputMessage], None]:
+        """
+        Run the agent with the given user input.
+
+        Args:
+            user_input (`str`):
+                The user input to run the agent with.
+            abort_event (`asyncio.Event`, *optional*):
+                An event that can be used to abort the agent. If the event is set, the agent will stop running.
+        """
+        self.messages.append({"role": "user", "content": user_input})
+
+        num_turns: int = 0
+        next_turn_should_call_tools = True
+
+        while True:
+            if abort_event and abort_event.is_set():
+                return
+
+            async for item in self.process_single_turn_with_tools(
+                self.messages,
+                exit_loop_tools=EXIT_LOOP_TOOLS,
+                exit_if_first_chunk_no_tool=(num_turns > 0 and next_turn_should_call_tools),
+            ):
+                yield item
+
+            num_turns += 1
+            last = self.messages[-1]
+
+            if last.get("role") == "tool" and last.get("name") in {t.function.name for t in EXIT_LOOP_TOOLS}:
+                return
+
+            if last.get("role") != "tool" and num_turns > MAX_NUM_TURNS:
+                return
+
+            if last.get("role") != "tool" and next_turn_should_call_tools:
+                return
+
+            next_turn_should_call_tools = last.get("role") != "tool"
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/cli.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/cli.py
new file mode 100644
index 0000000000000000000000000000000000000000..a8aaea687a2b372e5379f09dffc219e5ea5b38b8
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/cli.py
@@ -0,0 +1,247 @@
+import asyncio
+import os
+import signal
+import traceback
+from typing import Optional
+
+import typer
+from rich import print
+
+from ._cli_hacks import _async_prompt, _patch_anyio_open_process
+from .agent import Agent
+from .utils import _load_agent_config
+
+
+app = typer.Typer(
+    rich_markup_mode="rich",
+    help="A squad of lightweight composable AI applications built on Hugging Face's Inference Client and MCP stack.",
+)
+
+run_cli = typer.Typer(
+    name="run",
+    help="Run the Agent in the CLI",
+    invoke_without_command=True,
+)
+app.add_typer(run_cli, name="run")
+
+
+async def run_agent(
+    agent_path: Optional[str],
+) -> None:
+    """
+    Tiny Agent loop.
+
+    Args:
+        agent_path (`str`, *optional*):
+            Path to a local folder containing an `agent.json` and optionally a custom `PROMPT.md` or `AGENTS.md` file or a built-in agent stored in a Hugging Face dataset.
+
+    """
+    _patch_anyio_open_process()  # Hacky way to prevent stdio connections to be stopped by Ctrl+C
+
+    config, prompt = _load_agent_config(agent_path)
+
+    inputs = config.get("inputs", [])
+    servers = config.get("servers", [])
+
+    abort_event = asyncio.Event()
+    exit_event = asyncio.Event()
+    first_sigint = True
+
+    loop = asyncio.get_running_loop()
+    original_sigint_handler = signal.getsignal(signal.SIGINT)
+
+    def _sigint_handler() -> None:
+        nonlocal first_sigint
+        if first_sigint:
+            first_sigint = False
+            abort_event.set()
+            print("\n[red]Interrupted. Press Ctrl+C again to quit.[/red]", flush=True)
+            return
+
+        print("\n[red]Exiting...[/red]", flush=True)
+        exit_event.set()
+
+    try:
+        sigint_registered_in_loop = False
+        try:
+            loop.add_signal_handler(signal.SIGINT, _sigint_handler)
+            sigint_registered_in_loop = True
+        except (AttributeError, NotImplementedError):
+            # Windows (or any loop that doesn't support it) : fall back to sync
+            signal.signal(signal.SIGINT, lambda *_: _sigint_handler())
+
+        # Handle inputs (i.e. env variables injection)
+        resolved_inputs: dict[str, str] = {}
+
+        if len(inputs) > 0:
+            print(
+                "[bold blue]Some initial inputs are required by the agent. "
+                "Please provide a value or leave empty to load from env.[/bold blue]"
+            )
+            for input_item in inputs:
+                input_id = input_item["id"]
+                description = input_item["description"]
+                env_special_value = f"${{input:{input_id}}}"
+
+                # Check if the input is used by any server or as an apiKey
+                input_usages = set()
+                for server in servers:
+                    # Check stdio's "env" and http/sse's "headers" mappings
+                    env_or_headers = server.get("env", {}) if server["type"] == "stdio" else server.get("headers", {})
+                    for key, value in env_or_headers.items():
+                        if env_special_value in value:
+                            input_usages.add(key)
+
+                raw_api_key = config.get("apiKey")
+                if isinstance(raw_api_key, str) and env_special_value in raw_api_key:
+                    input_usages.add("apiKey")
+
+                if not input_usages:
+                    print(
+                        f"[yellow]Input '{input_id}' defined in config but not used by any server or as an API key."
+                        " Skipping.[/yellow]"
+                    )
+                    continue
+
+                # Prompt user for input
+                env_variable_key = input_id.replace("-", "_").upper()
+                print(
+                    f"[blue] • {input_id}[/blue]: {description}. (default: load from {env_variable_key}).",
+                    end=" ",
+                )
+                user_input = (await _async_prompt(exit_event=exit_event)).strip()
+                if exit_event.is_set():
+                    return
+
+                # Fallback to environment variable when user left blank
+                final_value = user_input
+                if not final_value:
+                    final_value = os.getenv(env_variable_key, "")
+                    if final_value:
+                        print(f"[green]Value successfully loaded from '{env_variable_key}'[/green]")
+                    else:
+                        print(
+                            f"[yellow]No value found for '{env_variable_key}' in environment variables. Continuing.[/yellow]"
+                        )
+                resolved_inputs[input_id] = final_value
+
+                # Inject resolved value (can be empty) into stdio's env or http/sse's headers
+                for server in servers:
+                    env_or_headers = server.get("env", {}) if server["type"] == "stdio" else server.get("headers", {})
+                    for key, value in env_or_headers.items():
+                        if env_special_value in value:
+                            env_or_headers[key] = env_or_headers[key].replace(env_special_value, final_value)
+
+            print()
+
+        raw_api_key = config.get("apiKey")
+        if isinstance(raw_api_key, str):
+            substituted_api_key = raw_api_key
+            for input_id, val in resolved_inputs.items():
+                substituted_api_key = substituted_api_key.replace(f"${{input:{input_id}}}", val)
+            config["apiKey"] = substituted_api_key
+        # Main agent loop
+        async with Agent(
+            provider=config.get("provider"),  # type: ignore[arg-type]
+            model=config.get("model"),
+            base_url=config.get("endpointUrl"),  # type: ignore[arg-type]
+            api_key=config.get("apiKey"),
+            servers=servers,  # type: ignore[arg-type]
+            prompt=prompt,
+        ) as agent:
+            await agent.load_tools()
+            print(f"[bold blue]Agent loaded with {len(agent.available_tools)} tools:[/bold blue]")
+            for t in agent.available_tools:
+                print(f"[blue] • {t.function.name}[/blue]")
+
+            while True:
+                abort_event.clear()
+
+                # Check if we should exit
+                if exit_event.is_set():
+                    return
+
+                try:
+                    user_input = await _async_prompt(exit_event=exit_event)
+                    first_sigint = True
+                except EOFError:
+                    print("\n[red]EOF received, exiting.[/red]", flush=True)
+                    break
+                except KeyboardInterrupt:
+                    if not first_sigint and abort_event.is_set():
+                        continue
+                    else:
+                        print("\n[red]Keyboard interrupt during input processing.[/red]", flush=True)
+                        break
+
+                try:
+                    async for chunk in agent.run(user_input, abort_event=abort_event):
+                        if abort_event.is_set() and not first_sigint:
+                            break
+                        if exit_event.is_set():
+                            return
+
+                        if hasattr(chunk, "choices"):
+                            delta = chunk.choices[0].delta
+                            if delta.content:
+                                print(delta.content, end="", flush=True)
+                            if delta.tool_calls:
+                                for call in delta.tool_calls:
+                                    if call.id:
+                                        print(f"<Tool {call.id}>", end="")
+                                    if call.function.name:
+                                        print(f"{call.function.name}", end=" ")
+                                    if call.function.arguments:
+                                        print(f"{call.function.arguments}", end="")
+                        else:
+                            print(
+                                f"\n\n[green]Tool[{chunk.name}] {chunk.tool_call_id}\n{chunk.content}[/green]\n",
+                                flush=True,
+                            )
+
+                    print()
+
+                except Exception as e:
+                    tb_str = traceback.format_exc()
+                    print(f"\n[bold red]Error during agent run: {e}\n{tb_str}[/bold red]", flush=True)
+                    first_sigint = True  # Allow graceful interrupt for the next command
+
+    except Exception as e:
+        tb_str = traceback.format_exc()
+        print(f"\n[bold red]An unexpected error occurred: {e}\n{tb_str}[/bold red]", flush=True)
+        raise e
+
+    finally:
+        if sigint_registered_in_loop:
+            try:
+                loop.remove_signal_handler(signal.SIGINT)
+            except (AttributeError, NotImplementedError):
+                pass
+        else:
+            signal.signal(signal.SIGINT, original_sigint_handler)
+
+
+@run_cli.callback()
+def run(
+    path: Optional[str] = typer.Argument(
+        None,
+        help=(
+            "Path to a local folder containing an agent.json file or a built-in agent "
+            "stored in the 'tiny-agents/tiny-agents' Hugging Face dataset "
+            "(https://huggingface.co/datasets/tiny-agents/tiny-agents)"
+        ),
+        show_default=False,
+    ),
+):
+    try:
+        asyncio.run(run_agent(path))
+    except KeyboardInterrupt:
+        print("\n[red]Application terminated by KeyboardInterrupt.[/red]", flush=True)
+        raise typer.Exit(code=130)
+    except Exception as e:
+        print(f"\n[bold red]An unexpected error occurred: {e}[/bold red]", flush=True)
+        raise e
+
+
+if __name__ == "__main__":
+    app()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/constants.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/constants.py
new file mode 100644
index 0000000000000000000000000000000000000000..1ccade43b151cc9650bfd8cb43d7e907c92447ef
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/constants.py
@@ -0,0 +1,82 @@
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+from typing import List
+
+from huggingface_hub import ChatCompletionInputTool
+
+
+FILENAME_CONFIG = "agent.json"
+PROMPT_FILENAMES = ("PROMPT.md", "AGENTS.md")
+
+DEFAULT_AGENT = {
+    "model": "Qwen/Qwen2.5-72B-Instruct",
+    "provider": "nebius",
+    "servers": [
+        {
+            "type": "stdio",
+            "command": "npx",
+            "args": [
+                "-y",
+                "@modelcontextprotocol/server-filesystem",
+                str(Path.home() / ("Desktop" if sys.platform == "darwin" else "")),
+            ],
+        },
+        {
+            "type": "stdio",
+            "command": "npx",
+            "args": ["@playwright/mcp@latest"],
+        },
+    ],
+}
+
+
+DEFAULT_SYSTEM_PROMPT = """
+You are an agent - please keep going until the user’s query is completely
+resolved, before ending your turn and yielding back to the user. Only terminate
+your turn when you are sure that the problem is solved, or if you need more
+info from the user to solve the problem.
+If you are not sure about anything pertaining to the user’s request, use your
+tools to read files and gather the relevant information: do NOT guess or make
+up an answer.
+You MUST plan extensively before each function call, and reflect extensively
+on the outcomes of the previous function calls. DO NOT do this entire process
+by making function calls only, as this can impair your ability to solve the
+problem and think insightfully.
+""".strip()
+
+MAX_NUM_TURNS = 10
+
+TASK_COMPLETE_TOOL: ChatCompletionInputTool = ChatCompletionInputTool.parse_obj(  # type: ignore[assignment]
+    {
+        "type": "function",
+        "function": {
+            "name": "task_complete",
+            "description": "Call this tool when the task given by the user is complete",
+            "parameters": {
+                "type": "object",
+                "properties": {},
+            },
+        },
+    }
+)
+
+ASK_QUESTION_TOOL: ChatCompletionInputTool = ChatCompletionInputTool.parse_obj(  # type: ignore[assignment]
+    {
+        "type": "function",
+        "function": {
+            "name": "ask_question",
+            "description": "Ask the user for more info required to solve or clarify their problem.",
+            "parameters": {
+                "type": "object",
+                "properties": {},
+            },
+        },
+    }
+)
+
+EXIT_LOOP_TOOLS: List[ChatCompletionInputTool] = [TASK_COMPLETE_TOOL, ASK_QUESTION_TOOL]
+
+
+DEFAULT_REPO_ID = "tiny-agents/tiny-agents"
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/mcp_client.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/mcp_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..67d1fc5d15c898a4130f341e62e60d32c7663d28
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/mcp_client.py
@@ -0,0 +1,384 @@
+import json
+import logging
+from contextlib import AsyncExitStack
+from datetime import timedelta
+from pathlib import Path
+from typing import TYPE_CHECKING, Any, AsyncIterable, Dict, List, Literal, Optional, Union, overload
+
+from typing_extensions import NotRequired, TypeAlias, TypedDict, Unpack
+
+from ...utils._runtime import get_hf_hub_version
+from .._generated._async_client import AsyncInferenceClient
+from .._generated.types import (
+    ChatCompletionInputMessage,
+    ChatCompletionInputTool,
+    ChatCompletionStreamOutput,
+    ChatCompletionStreamOutputDeltaToolCall,
+)
+from .._providers import PROVIDER_OR_POLICY_T
+from .utils import format_result
+
+
+if TYPE_CHECKING:
+    from mcp import ClientSession
+
+logger = logging.getLogger(__name__)
+
+# Type alias for tool names
+ToolName: TypeAlias = str
+
+ServerType: TypeAlias = Literal["stdio", "sse", "http"]
+
+
+class StdioServerParameters_T(TypedDict):
+    command: str
+    args: NotRequired[List[str]]
+    env: NotRequired[Dict[str, str]]
+    cwd: NotRequired[Union[str, Path, None]]
+
+
+class SSEServerParameters_T(TypedDict):
+    url: str
+    headers: NotRequired[Dict[str, Any]]
+    timeout: NotRequired[float]
+    sse_read_timeout: NotRequired[float]
+
+
+class StreamableHTTPParameters_T(TypedDict):
+    url: str
+    headers: NotRequired[dict[str, Any]]
+    timeout: NotRequired[timedelta]
+    sse_read_timeout: NotRequired[timedelta]
+    terminate_on_close: NotRequired[bool]
+
+
+class MCPClient:
+    """
+    Client for connecting to one or more MCP servers and processing chat completions with tools.
+
+    > [!WARNING]
+    > This class is experimental and might be subject to breaking changes in the future without prior notice.
+
+    Args:
+        model (`str`, `optional`):
+            The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. `meta-llama/Meta-Llama-3-8B-Instruct`
+            or a URL to a deployed Inference Endpoint or other local or remote endpoint.
+        provider (`str`, *optional*):
+            Name of the provider to use for inference. Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
+            If model is a URL or `base_url` is passed, then `provider` is not used.
+        base_url (`str`, *optional*):
+            The base URL to run inference. Defaults to None.
+        api_key (`str`, `optional`):
+            Token to use for authentication. Will default to the locally Hugging Face saved token if not provided. You can also use your own provider API key to interact directly with the provider's service.
+    """
+
+    def __init__(
+        self,
+        *,
+        model: Optional[str] = None,
+        provider: Optional[PROVIDER_OR_POLICY_T] = None,
+        base_url: Optional[str] = None,
+        api_key: Optional[str] = None,
+    ):
+        # Initialize MCP sessions as a dictionary of ClientSession objects
+        self.sessions: Dict[ToolName, "ClientSession"] = {}
+        self.exit_stack = AsyncExitStack()
+        self.available_tools: List[ChatCompletionInputTool] = []
+        # To be able to send the model in the payload if `base_url` is provided
+        if model is None and base_url is None:
+            raise ValueError("At least one of `model` or `base_url` should be set in `MCPClient`.")
+        self.payload_model = model
+        self.client = AsyncInferenceClient(
+            model=None if base_url is not None else model,
+            provider=provider,
+            api_key=api_key,
+            base_url=base_url,
+        )
+
+    async def __aenter__(self):
+        """Enter the context manager"""
+        await self.client.__aenter__()
+        await self.exit_stack.__aenter__()
+        return self
+
+    async def __aexit__(self, exc_type, exc_val, exc_tb):
+        """Exit the context manager"""
+        await self.client.__aexit__(exc_type, exc_val, exc_tb)
+        await self.cleanup()
+
+    async def cleanup(self):
+        """Clean up resources"""
+        await self.client.close()
+        await self.exit_stack.aclose()
+
+    @overload
+    async def add_mcp_server(self, type: Literal["stdio"], **params: Unpack[StdioServerParameters_T]): ...
+
+    @overload
+    async def add_mcp_server(self, type: Literal["sse"], **params: Unpack[SSEServerParameters_T]): ...
+
+    @overload
+    async def add_mcp_server(self, type: Literal["http"], **params: Unpack[StreamableHTTPParameters_T]): ...
+
+    async def add_mcp_server(self, type: ServerType, **params: Any):
+        """Connect to an MCP server
+
+        Args:
+            type (`str`):
+                Type of the server to connect to. Can be one of:
+                - "stdio": Standard input/output server (local)
+                - "sse": Server-sent events (SSE) server
+                - "http": StreamableHTTP server
+            **params (`Dict[str, Any]`):
+                Server parameters that can be either:
+                    - For stdio servers:
+                        - command (str): The command to run the MCP server
+                        - args (List[str], optional): Arguments for the command
+                        - env (Dict[str, str], optional): Environment variables for the command
+                        - cwd (Union[str, Path, None], optional): Working directory for the command
+                        - allowed_tools (List[str], optional): List of tool names to allow from this server
+                    - For SSE servers:
+                        - url (str): The URL of the SSE server
+                        - headers (Dict[str, Any], optional): Headers for the SSE connection
+                        - timeout (float, optional): Connection timeout
+                        - sse_read_timeout (float, optional): SSE read timeout
+                        - allowed_tools (List[str], optional): List of tool names to allow from this server
+                    - For StreamableHTTP servers:
+                        - url (str): The URL of the StreamableHTTP server
+                        - headers (Dict[str, Any], optional): Headers for the StreamableHTTP connection
+                        - timeout (timedelta, optional): Connection timeout
+                        - sse_read_timeout (timedelta, optional): SSE read timeout
+                        - terminate_on_close (bool, optional): Whether to terminate on close
+                        - allowed_tools (List[str], optional): List of tool names to allow from this server
+        """
+        from mcp import ClientSession, StdioServerParameters
+        from mcp import types as mcp_types
+
+        # Extract allowed_tools configuration if provided
+        allowed_tools = params.pop("allowed_tools", None)
+
+        # Determine server type and create appropriate parameters
+        if type == "stdio":
+            # Handle stdio server
+            from mcp.client.stdio import stdio_client
+
+            logger.info(f"Connecting to stdio MCP server with command: {params['command']} {params.get('args', [])}")
+
+            client_kwargs = {"command": params["command"]}
+            for key in ["args", "env", "cwd"]:
+                if params.get(key) is not None:
+                    client_kwargs[key] = params[key]
+            server_params = StdioServerParameters(**client_kwargs)
+            read, write = await self.exit_stack.enter_async_context(stdio_client(server_params))
+        elif type == "sse":
+            # Handle SSE server
+            from mcp.client.sse import sse_client
+
+            logger.info(f"Connecting to SSE MCP server at: {params['url']}")
+
+            client_kwargs = {"url": params["url"]}
+            for key in ["headers", "timeout", "sse_read_timeout"]:
+                if params.get(key) is not None:
+                    client_kwargs[key] = params[key]
+            read, write = await self.exit_stack.enter_async_context(sse_client(**client_kwargs))
+        elif type == "http":
+            # Handle StreamableHTTP server
+            from mcp.client.streamable_http import streamablehttp_client
+
+            logger.info(f"Connecting to StreamableHTTP MCP server at: {params['url']}")
+
+            client_kwargs = {"url": params["url"]}
+            for key in ["headers", "timeout", "sse_read_timeout", "terminate_on_close"]:
+                if params.get(key) is not None:
+                    client_kwargs[key] = params[key]
+            read, write, _ = await self.exit_stack.enter_async_context(streamablehttp_client(**client_kwargs))
+            # ^ TODO: should be handle `get_session_id_callback`? (function to retrieve the current session ID)
+        else:
+            raise ValueError(f"Unsupported server type: {type}")
+
+        session = await self.exit_stack.enter_async_context(
+            ClientSession(
+                read_stream=read,
+                write_stream=write,
+                client_info=mcp_types.Implementation(
+                    name="huggingface_hub.MCPClient",
+                    version=get_hf_hub_version(),
+                ),
+            )
+        )
+
+        logger.debug("Initializing session...")
+        await session.initialize()
+
+        # List available tools
+        response = await session.list_tools()
+        logger.debug("Connected to server with tools:", [tool.name for tool in response.tools])
+
+        # Filter tools based on allowed_tools configuration
+        filtered_tools = response.tools
+
+        if allowed_tools is not None:
+            filtered_tools = [tool for tool in response.tools if tool.name in allowed_tools]
+            logger.debug(
+                f"Tool filtering applied. Using {len(filtered_tools)} of {len(response.tools)} available tools: {[tool.name for tool in filtered_tools]}"
+            )
+
+        for tool in filtered_tools:
+            if tool.name in self.sessions:
+                logger.warning(f"Tool '{tool.name}' already defined by another server. Skipping.")
+                continue
+
+            # Map tool names to their server for later lookup
+            self.sessions[tool.name] = session
+
+            # Add tool to the list of available tools (for use in chat completions)
+            self.available_tools.append(
+                ChatCompletionInputTool.parse_obj_as_instance(
+                    {
+                        "type": "function",
+                        "function": {
+                            "name": tool.name,
+                            "description": tool.description,
+                            "parameters": tool.inputSchema,
+                        },
+                    }
+                )
+            )
+
+    async def process_single_turn_with_tools(
+        self,
+        messages: List[Union[Dict, ChatCompletionInputMessage]],
+        exit_loop_tools: Optional[List[ChatCompletionInputTool]] = None,
+        exit_if_first_chunk_no_tool: bool = False,
+    ) -> AsyncIterable[Union[ChatCompletionStreamOutput, ChatCompletionInputMessage]]:
+        """Process a query using `self.model` and available tools, yielding chunks and tool outputs.
+
+        Args:
+            messages (`List[Dict]`):
+                List of message objects representing the conversation history
+            exit_loop_tools (`List[ChatCompletionInputTool]`, *optional*):
+                List of tools that should exit the generator when called
+            exit_if_first_chunk_no_tool (`bool`, *optional*):
+                Exit if no tool is present in the first chunks. Default to False.
+
+        Yields:
+            [`ChatCompletionStreamOutput`] chunks or [`ChatCompletionInputMessage`] objects
+        """
+        # Prepare tools list based on options
+        tools = self.available_tools
+        if exit_loop_tools is not None:
+            tools = [*exit_loop_tools, *self.available_tools]
+
+        # Create the streaming request
+        response = await self.client.chat.completions.create(
+            model=self.payload_model,
+            messages=messages,
+            tools=tools,
+            tool_choice="auto",
+            stream=True,
+        )
+
+        message: Dict[str, Any] = {"role": "unknown", "content": ""}
+        final_tool_calls: Dict[int, ChatCompletionStreamOutputDeltaToolCall] = {}
+        num_of_chunks = 0
+
+        # Read from stream
+        async for chunk in response:
+            num_of_chunks += 1
+            delta = chunk.choices[0].delta if chunk.choices and len(chunk.choices) > 0 else None
+            if not delta:
+                continue
+
+            # Process message
+            if delta.role:
+                message["role"] = delta.role
+            if delta.content:
+                message["content"] += delta.content
+
+            # Process tool calls
+            if delta.tool_calls:
+                for tool_call in delta.tool_calls:
+                    idx = tool_call.index
+                    # first chunk for this tool call
+                    if idx not in final_tool_calls:
+                        final_tool_calls[idx] = tool_call
+                        if final_tool_calls[idx].function.arguments is None:
+                            final_tool_calls[idx].function.arguments = ""
+                        continue
+                    # safety before concatenating text to .function.arguments
+                    if final_tool_calls[idx].function.arguments is None:
+                        final_tool_calls[idx].function.arguments = ""
+
+                    if tool_call.function.arguments:
+                        final_tool_calls[idx].function.arguments += tool_call.function.arguments
+
+            # Optionally exit early if no tools in first chunks
+            if exit_if_first_chunk_no_tool and num_of_chunks <= 2 and len(final_tool_calls) == 0:
+                return
+
+            # Yield each chunk to caller
+            yield chunk
+
+        # Add the assistant message with tool calls (if any) to messages
+        if message["content"] or final_tool_calls:
+            # if the role is unknown, set it to assistant
+            if message.get("role") == "unknown":
+                message["role"] = "assistant"
+            # Convert final_tool_calls to the format expected by OpenAI
+            if final_tool_calls:
+                tool_calls_list: List[Dict[str, Any]] = []
+                for tc in final_tool_calls.values():
+                    tool_calls_list.append(
+                        {
+                            "id": tc.id,
+                            "type": "function",
+                            "function": {
+                                "name": tc.function.name,
+                                "arguments": tc.function.arguments or "{}",
+                            },
+                        }
+                    )
+                message["tool_calls"] = tool_calls_list
+            messages.append(message)
+
+        # Process tool calls one by one
+        for tool_call in final_tool_calls.values():
+            function_name = tool_call.function.name
+            try:
+                function_args = json.loads(tool_call.function.arguments or "{}")
+            except json.JSONDecodeError as err:
+                tool_message = {
+                    "role": "tool",
+                    "tool_call_id": tool_call.id,
+                    "name": function_name,
+                    "content": f"Invalid JSON generated by the model: {err}",
+                }
+                tool_message_as_obj = ChatCompletionInputMessage.parse_obj_as_instance(tool_message)
+                messages.append(tool_message_as_obj)
+                yield tool_message_as_obj
+                continue  # move to next tool call
+
+            tool_message = {"role": "tool", "tool_call_id": tool_call.id, "content": "", "name": function_name}
+
+            # Check if this is an exit loop tool
+            if exit_loop_tools and function_name in [t.function.name for t in exit_loop_tools]:
+                tool_message_as_obj = ChatCompletionInputMessage.parse_obj_as_instance(tool_message)
+                messages.append(tool_message_as_obj)
+                yield tool_message_as_obj
+                return
+
+            # Execute tool call with the appropriate session
+            session = self.sessions.get(function_name)
+            if session is not None:
+                try:
+                    result = await session.call_tool(function_name, function_args)
+                    tool_message["content"] = format_result(result)
+                except Exception as err:
+                    tool_message["content"] = f"Error: MCP tool call failed with error message: {err}"
+            else:
+                tool_message["content"] = f"Error: No session found for tool: {function_name}"
+
+            # Yield tool message
+            tool_message_as_obj = ChatCompletionInputMessage.parse_obj_as_instance(tool_message)
+            messages.append(tool_message_as_obj)
+            yield tool_message_as_obj
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/types.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/types.py
new file mode 100644
index 0000000000000000000000000000000000000000..100f67832ea02d7d5b6886d117536e97efe1c6ff
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/types.py
@@ -0,0 +1,45 @@
+from typing import Dict, List, Literal, TypedDict, Union
+
+from typing_extensions import NotRequired
+
+
+class InputConfig(TypedDict, total=False):
+    id: str
+    description: str
+    type: str
+    password: bool
+
+
+class StdioServerConfig(TypedDict):
+    type: Literal["stdio"]
+    command: str
+    args: List[str]
+    env: Dict[str, str]
+    cwd: str
+    allowed_tools: NotRequired[List[str]]
+
+
+class HTTPServerConfig(TypedDict):
+    type: Literal["http"]
+    url: str
+    headers: Dict[str, str]
+    allowed_tools: NotRequired[List[str]]
+
+
+class SSEServerConfig(TypedDict):
+    type: Literal["sse"]
+    url: str
+    headers: Dict[str, str]
+    allowed_tools: NotRequired[List[str]]
+
+
+ServerConfig = Union[StdioServerConfig, HTTPServerConfig, SSEServerConfig]
+
+
+# AgentConfig root object
+class AgentConfig(TypedDict):
+    model: str
+    provider: str
+    apiKey: NotRequired[str]
+    inputs: List[InputConfig]
+    servers: List[ServerConfig]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/utils.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..ddab10d6770397e4b1ad20ef4470679f3bfd60bb
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_mcp/utils.py
@@ -0,0 +1,128 @@
+"""
+Utility functions for MCPClient and Tiny Agents.
+
+Formatting utilities taken from the JS SDK: https://github.com/huggingface/huggingface.js/blob/main/packages/mcp-client/src/ResultFormatter.ts.
+"""
+
+import json
+from pathlib import Path
+from typing import TYPE_CHECKING, List, Optional, Tuple
+
+from huggingface_hub import snapshot_download
+from huggingface_hub.errors import EntryNotFoundError
+
+from .constants import DEFAULT_AGENT, DEFAULT_REPO_ID, FILENAME_CONFIG, PROMPT_FILENAMES
+from .types import AgentConfig
+
+
+if TYPE_CHECKING:
+    from mcp import types as mcp_types
+
+
+def format_result(result: "mcp_types.CallToolResult") -> str:
+    """
+    Formats a mcp.types.CallToolResult content into a human-readable string.
+
+    Args:
+        result (CallToolResult)
+            Object returned by mcp.ClientSession.call_tool.
+
+    Returns:
+        str
+            A formatted string representing the content of the result.
+    """
+    content = result.content
+
+    if len(content) == 0:
+        return "[No content]"
+
+    formatted_parts: List[str] = []
+
+    for item in content:
+        if item.type == "text":
+            formatted_parts.append(item.text)
+
+        elif item.type == "image":
+            formatted_parts.append(
+                f"[Binary Content: Image {item.mimeType}, {_get_base64_size(item.data)} bytes]\n"
+                f"The task is complete and the content accessible to the User"
+            )
+
+        elif item.type == "audio":
+            formatted_parts.append(
+                f"[Binary Content: Audio {item.mimeType}, {_get_base64_size(item.data)} bytes]\n"
+                f"The task is complete and the content accessible to the User"
+            )
+
+        elif item.type == "resource":
+            resource = item.resource
+
+            if hasattr(resource, "text"):
+                formatted_parts.append(resource.text)
+
+            elif hasattr(resource, "blob"):
+                formatted_parts.append(
+                    f"[Binary Content ({resource.uri}): {resource.mimeType}, {_get_base64_size(resource.blob)} bytes]\n"
+                    f"The task is complete and the content accessible to the User"
+                )
+
+    return "\n".join(formatted_parts)
+
+
+def _get_base64_size(base64_str: str) -> int:
+    """Estimate the byte size of a base64-encoded string."""
+    # Remove any prefix like "data:image/png;base64,"
+    if "," in base64_str:
+        base64_str = base64_str.split(",")[1]
+
+    padding = 0
+    if base64_str.endswith("=="):
+        padding = 2
+    elif base64_str.endswith("="):
+        padding = 1
+
+    return (len(base64_str) * 3) // 4 - padding
+
+
+def _load_agent_config(agent_path: Optional[str]) -> Tuple[AgentConfig, Optional[str]]:
+    """Load server config and prompt."""
+
+    def _read_dir(directory: Path) -> Tuple[AgentConfig, Optional[str]]:
+        cfg_file = directory / FILENAME_CONFIG
+        if not cfg_file.exists():
+            raise FileNotFoundError(f" Config file not found in {directory}! Please make sure it exists locally")
+
+        config: AgentConfig = json.loads(cfg_file.read_text(encoding="utf-8"))
+        prompt: Optional[str] = None
+        for filename in PROMPT_FILENAMES:
+            prompt_file = directory / filename
+            if prompt_file.exists():
+                prompt = prompt_file.read_text(encoding="utf-8")
+                break
+        return config, prompt
+
+    if agent_path is None:
+        return DEFAULT_AGENT, None  # type: ignore[return-value]
+
+    path = Path(agent_path).expanduser()
+
+    if path.is_file():
+        return json.loads(path.read_text(encoding="utf-8")), None
+
+    if path.is_dir():
+        return _read_dir(path)
+
+    # fetch from the Hub
+    try:
+        repo_dir = Path(
+            snapshot_download(
+                repo_id=DEFAULT_REPO_ID,
+                allow_patterns=f"{agent_path}/*",
+                repo_type="dataset",
+            )
+        )
+        return _read_dir(repo_dir / agent_path)
+    except Exception as err:
+        raise EntryNotFoundError(
+            f" Agent {agent_path} not found in tiny-agents/tiny-agents! Please make sure it exists in https://huggingface.co/datasets/tiny-agents/tiny-agents."
+        ) from err
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..79d2bd75c8329f73bf466cd6b14467579595d180
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__init__.py
@@ -0,0 +1,231 @@
+from typing import Dict, Literal, Optional, Union
+
+from huggingface_hub.inference._providers.featherless_ai import (
+    FeatherlessConversationalTask,
+    FeatherlessTextGenerationTask,
+)
+from huggingface_hub.utils import logging
+
+from ._common import TaskProviderHelper, _fetch_inference_provider_mapping
+from .black_forest_labs import BlackForestLabsTextToImageTask
+from .cerebras import CerebrasConversationalTask
+from .clarifai import ClarifaiConversationalTask
+from .cohere import CohereConversationalTask
+from .fal_ai import (
+    FalAIAutomaticSpeechRecognitionTask,
+    FalAIImageToImageTask,
+    FalAIImageToVideoTask,
+    FalAITextToImageTask,
+    FalAITextToSpeechTask,
+    FalAITextToVideoTask,
+)
+from .fireworks_ai import FireworksAIConversationalTask
+from .groq import GroqConversationalTask
+from .hf_inference import (
+    HFInferenceBinaryInputTask,
+    HFInferenceConversational,
+    HFInferenceFeatureExtractionTask,
+    HFInferenceTask,
+)
+from .hyperbolic import HyperbolicTextGenerationTask, HyperbolicTextToImageTask
+from .nebius import (
+    NebiusConversationalTask,
+    NebiusFeatureExtractionTask,
+    NebiusTextGenerationTask,
+    NebiusTextToImageTask,
+)
+from .novita import NovitaConversationalTask, NovitaTextGenerationTask, NovitaTextToVideoTask
+from .nscale import NscaleConversationalTask, NscaleTextToImageTask
+from .openai import OpenAIConversationalTask
+from .publicai import PublicAIConversationalTask
+from .replicate import ReplicateImageToImageTask, ReplicateTask, ReplicateTextToImageTask, ReplicateTextToSpeechTask
+from .sambanova import SambanovaConversationalTask, SambanovaFeatureExtractionTask
+from .scaleway import ScalewayConversationalTask, ScalewayFeatureExtractionTask
+from .together import TogetherConversationalTask, TogetherTextGenerationTask, TogetherTextToImageTask
+from .zai_org import ZaiConversationalTask
+
+
+logger = logging.get_logger(__name__)
+
+
+PROVIDER_T = Literal[
+    "black-forest-labs",
+    "cerebras",
+    "clarifai",
+    "cohere",
+    "fal-ai",
+    "featherless-ai",
+    "fireworks-ai",
+    "groq",
+    "hf-inference",
+    "hyperbolic",
+    "nebius",
+    "novita",
+    "nscale",
+    "openai",
+    "publicai",
+    "replicate",
+    "sambanova",
+    "scaleway",
+    "together",
+    "zai-org",
+]
+
+PROVIDER_OR_POLICY_T = Union[PROVIDER_T, Literal["auto"]]
+
+PROVIDERS: Dict[PROVIDER_T, Dict[str, TaskProviderHelper]] = {
+    "black-forest-labs": {
+        "text-to-image": BlackForestLabsTextToImageTask(),
+    },
+    "cerebras": {
+        "conversational": CerebrasConversationalTask(),
+    },
+    "clarifai": {
+        "conversational": ClarifaiConversationalTask(),
+    },
+    "cohere": {
+        "conversational": CohereConversationalTask(),
+    },
+    "fal-ai": {
+        "automatic-speech-recognition": FalAIAutomaticSpeechRecognitionTask(),
+        "text-to-image": FalAITextToImageTask(),
+        "text-to-speech": FalAITextToSpeechTask(),
+        "text-to-video": FalAITextToVideoTask(),
+        "image-to-video": FalAIImageToVideoTask(),
+        "image-to-image": FalAIImageToImageTask(),
+    },
+    "featherless-ai": {
+        "conversational": FeatherlessConversationalTask(),
+        "text-generation": FeatherlessTextGenerationTask(),
+    },
+    "fireworks-ai": {
+        "conversational": FireworksAIConversationalTask(),
+    },
+    "groq": {
+        "conversational": GroqConversationalTask(),
+    },
+    "hf-inference": {
+        "text-to-image": HFInferenceTask("text-to-image"),
+        "conversational": HFInferenceConversational(),
+        "text-generation": HFInferenceTask("text-generation"),
+        "text-classification": HFInferenceTask("text-classification"),
+        "question-answering": HFInferenceTask("question-answering"),
+        "audio-classification": HFInferenceBinaryInputTask("audio-classification"),
+        "automatic-speech-recognition": HFInferenceBinaryInputTask("automatic-speech-recognition"),
+        "fill-mask": HFInferenceTask("fill-mask"),
+        "feature-extraction": HFInferenceFeatureExtractionTask(),
+        "image-classification": HFInferenceBinaryInputTask("image-classification"),
+        "image-segmentation": HFInferenceBinaryInputTask("image-segmentation"),
+        "document-question-answering": HFInferenceTask("document-question-answering"),
+        "image-to-text": HFInferenceBinaryInputTask("image-to-text"),
+        "object-detection": HFInferenceBinaryInputTask("object-detection"),
+        "audio-to-audio": HFInferenceBinaryInputTask("audio-to-audio"),
+        "zero-shot-image-classification": HFInferenceBinaryInputTask("zero-shot-image-classification"),
+        "zero-shot-classification": HFInferenceTask("zero-shot-classification"),
+        "image-to-image": HFInferenceBinaryInputTask("image-to-image"),
+        "sentence-similarity": HFInferenceTask("sentence-similarity"),
+        "table-question-answering": HFInferenceTask("table-question-answering"),
+        "tabular-classification": HFInferenceTask("tabular-classification"),
+        "text-to-speech": HFInferenceTask("text-to-speech"),
+        "token-classification": HFInferenceTask("token-classification"),
+        "translation": HFInferenceTask("translation"),
+        "summarization": HFInferenceTask("summarization"),
+        "visual-question-answering": HFInferenceBinaryInputTask("visual-question-answering"),
+    },
+    "hyperbolic": {
+        "text-to-image": HyperbolicTextToImageTask(),
+        "conversational": HyperbolicTextGenerationTask("conversational"),
+        "text-generation": HyperbolicTextGenerationTask("text-generation"),
+    },
+    "nebius": {
+        "text-to-image": NebiusTextToImageTask(),
+        "conversational": NebiusConversationalTask(),
+        "text-generation": NebiusTextGenerationTask(),
+        "feature-extraction": NebiusFeatureExtractionTask(),
+    },
+    "novita": {
+        "text-generation": NovitaTextGenerationTask(),
+        "conversational": NovitaConversationalTask(),
+        "text-to-video": NovitaTextToVideoTask(),
+    },
+    "nscale": {
+        "conversational": NscaleConversationalTask(),
+        "text-to-image": NscaleTextToImageTask(),
+    },
+    "openai": {
+        "conversational": OpenAIConversationalTask(),
+    },
+    "publicai": {
+        "conversational": PublicAIConversationalTask(),
+    },
+    "replicate": {
+        "image-to-image": ReplicateImageToImageTask(),
+        "text-to-image": ReplicateTextToImageTask(),
+        "text-to-speech": ReplicateTextToSpeechTask(),
+        "text-to-video": ReplicateTask("text-to-video"),
+    },
+    "sambanova": {
+        "conversational": SambanovaConversationalTask(),
+        "feature-extraction": SambanovaFeatureExtractionTask(),
+    },
+    "scaleway": {
+        "conversational": ScalewayConversationalTask(),
+        "feature-extraction": ScalewayFeatureExtractionTask(),
+    },
+    "together": {
+        "text-to-image": TogetherTextToImageTask(),
+        "conversational": TogetherConversationalTask(),
+        "text-generation": TogetherTextGenerationTask(),
+    },
+    "zai-org": {
+        "conversational": ZaiConversationalTask(),
+    },
+}
+
+
+def get_provider_helper(
+    provider: Optional[PROVIDER_OR_POLICY_T], task: str, model: Optional[str]
+) -> TaskProviderHelper:
+    """Get provider helper instance by name and task.
+
+    Args:
+        provider (`str`, *optional*): name of the provider, or "auto" to automatically select the provider for the model.
+        task (`str`): Name of the task
+        model (`str`, *optional*): Name of the model
+    Returns:
+        TaskProviderHelper: Helper instance for the specified provider and task
+
+    Raises:
+        ValueError: If provider or task is not supported
+    """
+
+    if (model is None and provider in (None, "auto")) or (
+        model is not None and model.startswith(("http://", "https://"))
+    ):
+        provider = "hf-inference"
+
+    if provider is None:
+        logger.info(
+            "Defaulting to 'auto' which will select the first provider available for the model, sorted by the user's order in https://hf.co/settings/inference-providers."
+        )
+        provider = "auto"
+
+    if provider == "auto":
+        if model is None:
+            raise ValueError("Specifying a model is required when provider is 'auto'")
+        provider_mapping = _fetch_inference_provider_mapping(model)
+        provider = next(iter(provider_mapping)).provider
+
+    provider_tasks = PROVIDERS.get(provider)  # type: ignore
+    if provider_tasks is None:
+        raise ValueError(
+            f"Provider '{provider}' not supported. Available values: 'auto' or any provider from {list(PROVIDERS.keys())}."
+            "Passing 'auto' (default value) will automatically select the first provider available for the model, sorted "
+            "by the user's order in https://hf.co/settings/inference-providers."
+        )
+
+    if task not in provider_tasks:
+        raise ValueError(
+            f"Task '{task}' not supported for provider '{provider}'. Available tasks: {list(provider_tasks.keys())}"
+        )
+    return provider_tasks[task]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e9109f58905772a72944ac7faedd3bf7156f9f0c
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/_common.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/_common.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..dc9479b4afa6f65514bf3cd6c385ce9acd6f11dd
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/_common.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/black_forest_labs.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/black_forest_labs.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..55dd1cca87b7eb30239ba974a2b8e9da9b829738
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/black_forest_labs.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/cerebras.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/cerebras.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c548d1ace0c10022d6fe456b115a229c2a9e6c69
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/cerebras.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/clarifai.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/clarifai.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ac8d68517ecfc74eb56d648851d4118dc6960fa4
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/clarifai.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/cohere.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/cohere.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f34b50cc01283e838952279c0b3f6fb681a5e30d
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/cohere.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/fal_ai.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/fal_ai.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..0d8903d1600e9f0f881e2e5dee86ab695176f296
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/fal_ai.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/featherless_ai.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/featherless_ai.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ce29b6f7ebcaf07353c5a9854fa90acff5ef2b25
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/featherless_ai.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/fireworks_ai.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/fireworks_ai.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5b0d81b51553a1eddb72883135224c6016d5529e
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/fireworks_ai.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/groq.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/groq.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..22d13e6a81522ee5322479bd7ef6b54dc9c1fe71
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/groq.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/hf_inference.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/hf_inference.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..59c2828aca80030828b6cdc4fabc85a677f6c2d2
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/hf_inference.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/hyperbolic.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/hyperbolic.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..bcd762e9e247c7c09a84a53e34903326fa225722
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/hyperbolic.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/nebius.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/nebius.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b75746d98235aafa684da77ce0f13d77e0923fdb
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/nebius.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/novita.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/novita.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..4fdbe30b380cbe1f505ec9e732117772469d7e09
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/novita.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/nscale.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/nscale.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b000ab58f513b1a600ecd23d8c053747e681b80a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/nscale.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/openai.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/openai.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c92b694d8e08bc6f8ff2ef2e617be8dcc8efbec5
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/openai.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/publicai.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/publicai.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ce02c0a4f32331c2c34fc998975f8ccaa5259257
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/publicai.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/replicate.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/replicate.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..cd4efd843fd05274599d1c822264670d76a7d0d1
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/replicate.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/sambanova.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/sambanova.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..07cb4203e4300239da3110e7f59e5c12ddd9a23a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/sambanova.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/scaleway.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/scaleway.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..9fa0d7e472b21227dc66f0ba6968c3f74dfac6fe
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/scaleway.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/together.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/together.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..8f951a1c49e427945b45b335f0f2939c64110134
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/together.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/zai_org.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/zai_org.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..74e327d6b6ac984fea647aae008a577fbd434046
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/__pycache__/zai_org.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/_common.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/_common.py
new file mode 100644
index 0000000000000000000000000000000000000000..366fc3f45d6760e21c748e0ead7e4b3510efbc72
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/_common.py
@@ -0,0 +1,323 @@
+from functools import lru_cache
+from typing import Any, Dict, List, Optional, Union, overload
+
+from huggingface_hub import constants
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import MimeBytes, RequestParameters
+from huggingface_hub.inference._generated.types.chat_completion import ChatCompletionInputMessage
+from huggingface_hub.utils import build_hf_headers, get_token, logging
+
+
+logger = logging.get_logger(__name__)
+
+
+# Dev purposes only.
+# If you want to try to run inference for a new model locally before it's registered on huggingface.co
+# for a given Inference Provider, you can add it to the following dictionary.
+HARDCODED_MODEL_INFERENCE_MAPPING: Dict[str, Dict[str, InferenceProviderMapping]] = {
+    # "HF model ID" => InferenceProviderMapping object initialized with "Model ID on Inference Provider's side"
+    #
+    # Example:
+    # "Qwen/Qwen2.5-Coder-32B-Instruct": InferenceProviderMapping(hf_model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
+    #                                    provider_id="Qwen2.5-Coder-32B-Instruct",
+    #                                    task="conversational",
+    #                                    status="live")
+    "cerebras": {},
+    "cohere": {},
+    "clarifai": {},
+    "fal-ai": {},
+    "fireworks-ai": {},
+    "groq": {},
+    "hf-inference": {},
+    "hyperbolic": {},
+    "nebius": {},
+    "nscale": {},
+    "replicate": {},
+    "sambanova": {},
+    "scaleway": {},
+    "together": {},
+    "zai-org": {},
+}
+
+
+@overload
+def filter_none(obj: Dict[str, Any]) -> Dict[str, Any]: ...
+@overload
+def filter_none(obj: List[Any]) -> List[Any]: ...
+
+
+def filter_none(obj: Union[Dict[str, Any], List[Any]]) -> Union[Dict[str, Any], List[Any]]:
+    if isinstance(obj, dict):
+        cleaned: Dict[str, Any] = {}
+        for k, v in obj.items():
+            if v is None:
+                continue
+            if isinstance(v, (dict, list)):
+                v = filter_none(v)
+            cleaned[k] = v
+        return cleaned
+
+    if isinstance(obj, list):
+        return [filter_none(v) if isinstance(v, (dict, list)) else v for v in obj]
+
+    raise ValueError(f"Expected dict or list, got {type(obj)}")
+
+
+class TaskProviderHelper:
+    """Base class for task-specific provider helpers."""
+
+    def __init__(self, provider: str, base_url: str, task: str) -> None:
+        self.provider = provider
+        self.task = task
+        self.base_url = base_url
+
+    def prepare_request(
+        self,
+        *,
+        inputs: Any,
+        parameters: Dict[str, Any],
+        headers: Dict,
+        model: Optional[str],
+        api_key: Optional[str],
+        extra_payload: Optional[Dict[str, Any]] = None,
+    ) -> RequestParameters:
+        """
+        Prepare the request to be sent to the provider.
+
+        Each step (api_key, model, headers, url, payload) can be customized in subclasses.
+        """
+        # api_key from user, or local token, or raise error
+        api_key = self._prepare_api_key(api_key)
+
+        # mapped model from HF model ID
+        provider_mapping_info = self._prepare_mapping_info(model)
+
+        # default HF headers + user headers (to customize in subclasses)
+        headers = self._prepare_headers(headers, api_key)
+
+        # routed URL if HF token, or direct URL (to customize in '_prepare_route' in subclasses)
+        url = self._prepare_url(api_key, provider_mapping_info.provider_id)
+
+        # prepare payload (to customize in subclasses)
+        payload = self._prepare_payload_as_dict(inputs, parameters, provider_mapping_info=provider_mapping_info)
+        if payload is not None:
+            payload = recursive_merge(payload, filter_none(extra_payload or {}))
+
+        # body data (to customize in subclasses)
+        data = self._prepare_payload_as_bytes(inputs, parameters, provider_mapping_info, extra_payload)
+
+        # check if both payload and data are set and return
+        if payload is not None and data is not None:
+            raise ValueError("Both payload and data cannot be set in the same request.")
+        if payload is None and data is None:
+            raise ValueError("Either payload or data must be set in the request.")
+
+        # normalize headers to lowercase and add content-type if not present
+        normalized_headers = self._normalize_headers(headers, payload, data)
+
+        return RequestParameters(
+            url=url,
+            task=self.task,
+            model=provider_mapping_info.provider_id,
+            json=payload,
+            data=data,
+            headers=normalized_headers,
+        )
+
+    def get_response(
+        self,
+        response: Union[bytes, Dict],
+        request_params: Optional[RequestParameters] = None,
+    ) -> Any:
+        """
+        Return the response in the expected format.
+
+        Override this method in subclasses for customized response handling."""
+        return response
+
+    def _prepare_api_key(self, api_key: Optional[str]) -> str:
+        """Return the API key to use for the request.
+
+        Usually not overwritten in subclasses."""
+        if api_key is None:
+            api_key = get_token()
+        if api_key is None:
+            raise ValueError(
+                f"You must provide an api_key to work with {self.provider} API or log in with `hf auth login`."
+            )
+        return api_key
+
+    def _prepare_mapping_info(self, model: Optional[str]) -> InferenceProviderMapping:
+        """Return the mapped model ID to use for the request.
+
+        Usually not overwritten in subclasses."""
+        if model is None:
+            raise ValueError(f"Please provide an HF model ID supported by {self.provider}.")
+
+        # hardcoded mapping for local testing
+        if HARDCODED_MODEL_INFERENCE_MAPPING.get(self.provider, {}).get(model):
+            return HARDCODED_MODEL_INFERENCE_MAPPING[self.provider][model]
+
+        provider_mapping = None
+        for mapping in _fetch_inference_provider_mapping(model):
+            if mapping.provider == self.provider:
+                provider_mapping = mapping
+                break
+
+        if provider_mapping is None:
+            raise ValueError(f"Model {model} is not supported by provider {self.provider}.")
+
+        if provider_mapping.task != self.task:
+            raise ValueError(
+                f"Model {model} is not supported for task {self.task} and provider {self.provider}. "
+                f"Supported task: {provider_mapping.task}."
+            )
+
+        if provider_mapping.status == "staging":
+            logger.warning(
+                f"Model {model} is in staging mode for provider {self.provider}. Meant for test purposes only."
+            )
+        if provider_mapping.status == "error":
+            logger.warning(
+                f"Our latest automated health check on model '{model}' for provider '{self.provider}' did not complete successfully.  "
+                "Inference call might fail."
+            )
+        return provider_mapping
+
+    def _normalize_headers(
+        self, headers: Dict[str, Any], payload: Optional[Dict[str, Any]], data: Optional[MimeBytes]
+    ) -> Dict[str, Any]:
+        """Normalize the headers to use for the request.
+
+        Override this method in subclasses for customized headers.
+        """
+        normalized_headers = {key.lower(): value for key, value in headers.items() if value is not None}
+        if normalized_headers.get("content-type") is None:
+            if data is not None and data.mime_type is not None:
+                normalized_headers["content-type"] = data.mime_type
+            elif payload is not None:
+                normalized_headers["content-type"] = "application/json"
+        return normalized_headers
+
+    def _prepare_headers(self, headers: Dict, api_key: str) -> Dict[str, Any]:
+        """Return the headers to use for the request.
+
+        Override this method in subclasses for customized headers.
+        """
+        return {**build_hf_headers(token=api_key), **headers}
+
+    def _prepare_url(self, api_key: str, mapped_model: str) -> str:
+        """Return the URL to use for the request.
+
+        Usually not overwritten in subclasses."""
+        base_url = self._prepare_base_url(api_key)
+        route = self._prepare_route(mapped_model, api_key)
+        return f"{base_url.rstrip('/')}/{route.lstrip('/')}"
+
+    def _prepare_base_url(self, api_key: str) -> str:
+        """Return the base URL to use for the request.
+
+        Usually not overwritten in subclasses."""
+        # Route to the proxy if the api_key is a HF TOKEN
+        if api_key.startswith("hf_"):
+            logger.info(f"Calling '{self.provider}' provider through Hugging Face router.")
+            return constants.INFERENCE_PROXY_TEMPLATE.format(provider=self.provider)
+        else:
+            logger.info(f"Calling '{self.provider}' provider directly.")
+            return self.base_url
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        """Return the route to use for the request.
+
+        Override this method in subclasses for customized routes.
+        """
+        return ""
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        """Return the payload to use for the request, as a dict.
+
+        Override this method in subclasses for customized payloads.
+        Only one of `_prepare_payload_as_dict` and `_prepare_payload_as_bytes` should return a value.
+        """
+        return None
+
+    def _prepare_payload_as_bytes(
+        self,
+        inputs: Any,
+        parameters: Dict,
+        provider_mapping_info: InferenceProviderMapping,
+        extra_payload: Optional[Dict],
+    ) -> Optional[MimeBytes]:
+        """Return the body to use for the request, as bytes.
+
+        Override this method in subclasses for customized body data.
+        Only one of `_prepare_payload_as_dict` and `_prepare_payload_as_bytes` should return a value.
+        """
+        return None
+
+
+class BaseConversationalTask(TaskProviderHelper):
+    """
+    Base class for conversational (chat completion) tasks.
+    The schema follows the OpenAI API format defined here: https://platform.openai.com/docs/api-reference/chat
+    """
+
+    def __init__(self, provider: str, base_url: str):
+        super().__init__(provider=provider, base_url=base_url, task="conversational")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/v1/chat/completions"
+
+    def _prepare_payload_as_dict(
+        self,
+        inputs: List[Union[Dict, ChatCompletionInputMessage]],
+        parameters: Dict,
+        provider_mapping_info: InferenceProviderMapping,
+    ) -> Optional[Dict]:
+        return filter_none({"messages": inputs, **parameters, "model": provider_mapping_info.provider_id})
+
+
+class BaseTextGenerationTask(TaskProviderHelper):
+    """
+    Base class for text-generation (completion) tasks.
+    The schema follows the OpenAI API format defined here: https://platform.openai.com/docs/api-reference/completions
+    """
+
+    def __init__(self, provider: str, base_url: str):
+        super().__init__(provider=provider, base_url=base_url, task="text-generation")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/v1/completions"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        return filter_none({"prompt": inputs, **parameters, "model": provider_mapping_info.provider_id})
+
+
+@lru_cache(maxsize=None)
+def _fetch_inference_provider_mapping(model: str) -> List["InferenceProviderMapping"]:
+    """
+    Fetch provider mappings for a model from the Hub.
+    """
+    from huggingface_hub.hf_api import HfApi
+
+    info = HfApi().model_info(model, expand=["inferenceProviderMapping"])
+    provider_mapping = info.inference_provider_mapping
+    if provider_mapping is None:
+        raise ValueError(f"No provider mapping found for model {model}")
+    return provider_mapping
+
+
+def recursive_merge(dict1: Dict, dict2: Dict) -> Dict:
+    return {
+        **dict1,
+        **{
+            key: recursive_merge(dict1[key], value)
+            if (key in dict1 and isinstance(dict1[key], dict) and isinstance(value, dict))
+            else value
+            for key, value in dict2.items()
+        },
+    }
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/black_forest_labs.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/black_forest_labs.py
new file mode 100644
index 0000000000000000000000000000000000000000..a5d96832256e3505d503a7d23bbcee76e485561a
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/black_forest_labs.py
@@ -0,0 +1,69 @@
+import time
+from typing import Any, Dict, Optional, Union
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import RequestParameters, _as_dict
+from huggingface_hub.inference._providers._common import TaskProviderHelper, filter_none
+from huggingface_hub.utils import logging
+from huggingface_hub.utils._http import get_session
+
+
+logger = logging.get_logger(__name__)
+
+MAX_POLLING_ATTEMPTS = 6
+POLLING_INTERVAL = 1.0
+
+
+class BlackForestLabsTextToImageTask(TaskProviderHelper):
+    def __init__(self):
+        super().__init__(provider="black-forest-labs", base_url="https://api.us1.bfl.ai", task="text-to-image")
+
+    def _prepare_headers(self, headers: Dict, api_key: str) -> Dict[str, Any]:
+        headers = super()._prepare_headers(headers, api_key)
+        if not api_key.startswith("hf_"):
+            _ = headers.pop("authorization")
+            headers["X-Key"] = api_key
+        return headers
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return f"/v1/{mapped_model}"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        parameters = filter_none(parameters)
+        if "num_inference_steps" in parameters:
+            parameters["steps"] = parameters.pop("num_inference_steps")
+        if "guidance_scale" in parameters:
+            parameters["guidance"] = parameters.pop("guidance_scale")
+
+        return {"prompt": inputs, **parameters}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        """
+        Polling mechanism for Black Forest Labs since the API is asynchronous.
+        """
+        url = _as_dict(response).get("polling_url")
+        session = get_session()
+        for _ in range(MAX_POLLING_ATTEMPTS):
+            time.sleep(POLLING_INTERVAL)
+
+            response = session.get(url, headers={"Content-Type": "application/json"})  # type: ignore
+            response.raise_for_status()  # type: ignore
+            response_json: Dict = response.json()  # type: ignore
+            status = response_json.get("status")
+            logger.info(
+                f"Polling generation result from {url}. Current status: {status}. "
+                f"Will retry after {POLLING_INTERVAL} seconds if not ready."
+            )
+
+            if (
+                status == "Ready"
+                and isinstance(response_json.get("result"), dict)
+                and (sample_url := response_json["result"].get("sample"))
+            ):
+                image_resp = session.get(sample_url)
+                image_resp.raise_for_status()
+                return image_resp.content
+
+        raise TimeoutError(f"Failed to get the image URL after {MAX_POLLING_ATTEMPTS} attempts.")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/cerebras.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/cerebras.py
new file mode 100644
index 0000000000000000000000000000000000000000..a9b9c3aacb3e134a8e755297c15ece198ffe633d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/cerebras.py
@@ -0,0 +1,6 @@
+from ._common import BaseConversationalTask
+
+
+class CerebrasConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider="cerebras", base_url="https://api.cerebras.ai")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/clarifai.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/clarifai.py
new file mode 100644
index 0000000000000000000000000000000000000000..5f118b7fc9a8dafb01305758791191ccef045a5d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/clarifai.py
@@ -0,0 +1,13 @@
+from ._common import BaseConversationalTask
+
+
+_PROVIDER = "clarifai"
+_BASE_URL = "https://api.clarifai.com"
+
+
+class ClarifaiConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL)
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/v2/ext/openai/v1/chat/completions"
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/cohere.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/cohere.py
new file mode 100644
index 0000000000000000000000000000000000000000..a5e9191caec50b0e659dddceba3e817a4ac28307
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/cohere.py
@@ -0,0 +1,32 @@
+from typing import Any, Dict, Optional
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+
+from ._common import BaseConversationalTask
+
+
+_PROVIDER = "cohere"
+_BASE_URL = "https://api.cohere.com"
+
+
+class CohereConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL)
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/compatibility/v1/chat/completions"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        payload = super()._prepare_payload_as_dict(inputs, parameters, provider_mapping_info)
+        response_format = parameters.get("response_format")
+        if isinstance(response_format, dict) and response_format.get("type") == "json_schema":
+            json_schema_details = response_format.get("json_schema")
+            if isinstance(json_schema_details, dict) and "schema" in json_schema_details:
+                payload["response_format"] = {  # type: ignore [index]
+                    "type": "json_object",
+                    "schema": json_schema_details["schema"],
+                }
+
+        return payload
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/fal_ai.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/fal_ai.py
new file mode 100644
index 0000000000000000000000000000000000000000..bc2c41d04f811f6a6508ea6abf84593add31ef42
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/fal_ai.py
@@ -0,0 +1,248 @@
+import base64
+import time
+from abc import ABC
+from typing import Any, Dict, Optional, Union
+from urllib.parse import urlparse
+
+from huggingface_hub import constants
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import RequestParameters, _as_dict, _as_url
+from huggingface_hub.inference._providers._common import TaskProviderHelper, filter_none
+from huggingface_hub.utils import get_session, hf_raise_for_status
+from huggingface_hub.utils.logging import get_logger
+
+
+logger = get_logger(__name__)
+
+# Arbitrary polling interval
+_POLLING_INTERVAL = 0.5
+
+
+class FalAITask(TaskProviderHelper, ABC):
+    def __init__(self, task: str):
+        super().__init__(provider="fal-ai", base_url="https://fal.run", task=task)
+
+    def _prepare_headers(self, headers: Dict, api_key: str) -> Dict[str, Any]:
+        headers = super()._prepare_headers(headers, api_key)
+        if not api_key.startswith("hf_"):
+            headers["authorization"] = f"Key {api_key}"
+        return headers
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return f"/{mapped_model}"
+
+
+class FalAIQueueTask(TaskProviderHelper, ABC):
+    def __init__(self, task: str):
+        super().__init__(provider="fal-ai", base_url="https://queue.fal.run", task=task)
+
+    def _prepare_headers(self, headers: Dict, api_key: str) -> Dict[str, Any]:
+        headers = super()._prepare_headers(headers, api_key)
+        if not api_key.startswith("hf_"):
+            headers["authorization"] = f"Key {api_key}"
+        return headers
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        if api_key.startswith("hf_"):
+            # Use the queue subdomain for HF routing
+            return f"/{mapped_model}?_subdomain=queue"
+        return f"/{mapped_model}"
+
+    def get_response(
+        self,
+        response: Union[bytes, Dict],
+        request_params: Optional[RequestParameters] = None,
+    ) -> Any:
+        response_dict = _as_dict(response)
+
+        request_id = response_dict.get("request_id")
+        if not request_id:
+            raise ValueError("No request ID found in the response")
+        if request_params is None:
+            raise ValueError(
+                f"A `RequestParameters` object should be provided to get {self.task} responses with Fal AI."
+            )
+
+        # extract the base url and query params
+        parsed_url = urlparse(request_params.url)
+        # a bit hacky way to concatenate the provider name without parsing `parsed_url.path`
+        base_url = f"{parsed_url.scheme}://{parsed_url.netloc}{'/fal-ai' if parsed_url.netloc == 'router.huggingface.co' else ''}"
+        query_param = f"?{parsed_url.query}" if parsed_url.query else ""
+
+        # extracting the provider model id for status and result urls
+        # from the response as it might be different from the mapped model in `request_params.url`
+        model_id = urlparse(response_dict.get("response_url")).path
+        status_url = f"{base_url}{str(model_id)}/status{query_param}"
+        result_url = f"{base_url}{str(model_id)}{query_param}"
+
+        status = response_dict.get("status")
+        logger.info("Generating the output.. this can take several minutes.")
+        while status != "COMPLETED":
+            time.sleep(_POLLING_INTERVAL)
+            status_response = get_session().get(status_url, headers=request_params.headers)
+            hf_raise_for_status(status_response)
+            status = status_response.json().get("status")
+
+        return get_session().get(result_url, headers=request_params.headers).json()
+
+
+class FalAIAutomaticSpeechRecognitionTask(FalAITask):
+    def __init__(self):
+        super().__init__("automatic-speech-recognition")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        if isinstance(inputs, str) and inputs.startswith(("http://", "https://")):
+            # If input is a URL, pass it directly
+            audio_url = inputs
+        else:
+            # If input is a file path, read it first
+            if isinstance(inputs, str):
+                with open(inputs, "rb") as f:
+                    inputs = f.read()
+
+            audio_b64 = base64.b64encode(inputs).decode()
+            content_type = "audio/mpeg"
+            audio_url = f"data:{content_type};base64,{audio_b64}"
+
+        return {"audio_url": audio_url, **filter_none(parameters)}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        text = _as_dict(response)["text"]
+        if not isinstance(text, str):
+            raise ValueError(f"Unexpected output format from FalAI API. Expected string, got {type(text)}.")
+        return text
+
+
+class FalAITextToImageTask(FalAITask):
+    def __init__(self):
+        super().__init__("text-to-image")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        payload: Dict[str, Any] = {
+            "prompt": inputs,
+            **filter_none(parameters),
+        }
+        if "width" in payload and "height" in payload:
+            payload["image_size"] = {
+                "width": payload.pop("width"),
+                "height": payload.pop("height"),
+            }
+        if provider_mapping_info.adapter_weights_path is not None:
+            lora_path = constants.HUGGINGFACE_CO_URL_TEMPLATE.format(
+                repo_id=provider_mapping_info.hf_model_id,
+                revision="main",
+                filename=provider_mapping_info.adapter_weights_path,
+            )
+            payload["loras"] = [{"path": lora_path, "scale": 1}]
+            if provider_mapping_info.provider_id == "fal-ai/lora":
+                # little hack: fal requires the base model for stable-diffusion-based loras but not for flux-based
+                # See payloads in https://fal.ai/models/fal-ai/lora/api vs https://fal.ai/models/fal-ai/flux-lora/api
+                payload["model_name"] = "stabilityai/stable-diffusion-xl-base-1.0"
+
+        return payload
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        url = _as_dict(response)["images"][0]["url"]
+        return get_session().get(url).content
+
+
+class FalAITextToSpeechTask(FalAITask):
+    def __init__(self):
+        super().__init__("text-to-speech")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        return {"text": inputs, **filter_none(parameters)}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        url = _as_dict(response)["audio"]["url"]
+        return get_session().get(url).content
+
+
+class FalAITextToVideoTask(FalAIQueueTask):
+    def __init__(self):
+        super().__init__("text-to-video")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        return {"prompt": inputs, **filter_none(parameters)}
+
+    def get_response(
+        self,
+        response: Union[bytes, Dict],
+        request_params: Optional[RequestParameters] = None,
+    ) -> Any:
+        output = super().get_response(response, request_params)
+        url = _as_dict(output)["video"]["url"]
+        return get_session().get(url).content
+
+
+class FalAIImageToImageTask(FalAIQueueTask):
+    def __init__(self):
+        super().__init__("image-to-image")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        image_url = _as_url(inputs, default_mime_type="image/jpeg")
+        if "target_size" in parameters:
+            parameters["image_size"] = parameters.pop("target_size")
+        payload: Dict[str, Any] = {
+            "image_url": image_url,
+            **filter_none(parameters),
+        }
+        if provider_mapping_info.adapter_weights_path is not None:
+            lora_path = constants.HUGGINGFACE_CO_URL_TEMPLATE.format(
+                repo_id=provider_mapping_info.hf_model_id,
+                revision="main",
+                filename=provider_mapping_info.adapter_weights_path,
+            )
+            payload["loras"] = [{"path": lora_path, "scale": 1}]
+
+        return payload
+
+    def get_response(
+        self,
+        response: Union[bytes, Dict],
+        request_params: Optional[RequestParameters] = None,
+    ) -> Any:
+        output = super().get_response(response, request_params)
+        url = _as_dict(output)["images"][0]["url"]
+        return get_session().get(url).content
+
+
+class FalAIImageToVideoTask(FalAIQueueTask):
+    def __init__(self):
+        super().__init__("image-to-video")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        image_url = _as_url(inputs, default_mime_type="image/jpeg")
+        payload: Dict[str, Any] = {
+            "image_url": image_url,
+            **filter_none(parameters),
+        }
+        if provider_mapping_info.adapter_weights_path is not None:
+            lora_path = constants.HUGGINGFACE_CO_URL_TEMPLATE.format(
+                repo_id=provider_mapping_info.hf_model_id,
+                revision="main",
+                filename=provider_mapping_info.adapter_weights_path,
+            )
+            payload["loras"] = [{"path": lora_path, "scale": 1}]
+        return payload
+
+    def get_response(
+        self,
+        response: Union[bytes, Dict],
+        request_params: Optional[RequestParameters] = None,
+    ) -> Any:
+        output = super().get_response(response, request_params)
+        url = _as_dict(output)["video"]["url"]
+        return get_session().get(url).content
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/featherless_ai.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/featherless_ai.py
new file mode 100644
index 0000000000000000000000000000000000000000..6ad1c48134f5c990b6ac4fca5ff919f4cc0d2373
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/featherless_ai.py
@@ -0,0 +1,38 @@
+from typing import Any, Dict, Optional, Union
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import RequestParameters, _as_dict
+
+from ._common import BaseConversationalTask, BaseTextGenerationTask, filter_none
+
+
+_PROVIDER = "featherless-ai"
+_BASE_URL = "https://api.featherless.ai"
+
+
+class FeatherlessTextGenerationTask(BaseTextGenerationTask):
+    def __init__(self):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL)
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        params = filter_none(parameters.copy())
+        params["max_tokens"] = params.pop("max_new_tokens", None)
+
+        return {"prompt": inputs, **params, "model": provider_mapping_info.provider_id}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        output = _as_dict(response)["choices"][0]
+        return {
+            "generated_text": output["text"],
+            "details": {
+                "finish_reason": output.get("finish_reason"),
+                "seed": output.get("seed"),
+            },
+        }
+
+
+class FeatherlessConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/fireworks_ai.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/fireworks_ai.py
new file mode 100644
index 0000000000000000000000000000000000000000..b4cc19a5700047f6516b2784d9785a99d7e32451
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/fireworks_ai.py
@@ -0,0 +1,27 @@
+from typing import Any, Dict, Optional
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+
+from ._common import BaseConversationalTask
+
+
+class FireworksAIConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider="fireworks-ai", base_url="https://api.fireworks.ai")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/inference/v1/chat/completions"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        payload = super()._prepare_payload_as_dict(inputs, parameters, provider_mapping_info)
+        response_format = parameters.get("response_format")
+        if isinstance(response_format, dict) and response_format.get("type") == "json_schema":
+            json_schema_details = response_format.get("json_schema")
+            if isinstance(json_schema_details, dict) and "schema" in json_schema_details:
+                payload["response_format"] = {  # type: ignore [index]
+                    "type": "json_object",
+                    "schema": json_schema_details["schema"],
+                }
+        return payload
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/groq.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/groq.py
new file mode 100644
index 0000000000000000000000000000000000000000..11e677504e89bc02b966e7d37d9e11f1b94b297f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/groq.py
@@ -0,0 +1,9 @@
+from ._common import BaseConversationalTask
+
+
+class GroqConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider="groq", base_url="https://api.groq.com")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/openai/v1/chat/completions"
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/hf_inference.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/hf_inference.py
new file mode 100644
index 0000000000000000000000000000000000000000..d90d00c4f3e5b93029ed979df6e310635a639d93
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/hf_inference.py
@@ -0,0 +1,228 @@
+import json
+from functools import lru_cache
+from pathlib import Path
+from typing import Any, Dict, Optional, Union
+from urllib.parse import urlparse, urlunparse
+
+from huggingface_hub import constants
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import (
+    MimeBytes,
+    RequestParameters,
+    _b64_encode,
+    _bytes_to_dict,
+    _open_as_mime_bytes,
+)
+from huggingface_hub.inference._providers._common import TaskProviderHelper, filter_none
+from huggingface_hub.utils import build_hf_headers, get_session, get_token, hf_raise_for_status
+
+
+class HFInferenceTask(TaskProviderHelper):
+    """Base class for HF Inference API tasks."""
+
+    def __init__(self, task: str):
+        super().__init__(
+            provider="hf-inference",
+            base_url=constants.INFERENCE_PROXY_TEMPLATE.format(provider="hf-inference"),
+            task=task,
+        )
+
+    def _prepare_api_key(self, api_key: Optional[str]) -> str:
+        # special case: for HF Inference we allow not providing an API key
+        return api_key or get_token()  # type: ignore[return-value]
+
+    def _prepare_mapping_info(self, model: Optional[str]) -> InferenceProviderMapping:
+        if model is not None and model.startswith(("http://", "https://")):
+            return InferenceProviderMapping(
+                provider="hf-inference", providerId=model, hf_model_id=model, task=self.task, status="live"
+            )
+        model_id = model if model is not None else _fetch_recommended_models().get(self.task)
+        if model_id is None:
+            raise ValueError(
+                f"Task {self.task} has no recommended model for HF Inference. Please specify a model"
+                " explicitly. Visit https://huggingface.co/tasks for more info."
+            )
+        _check_supported_task(model_id, self.task)
+        return InferenceProviderMapping(
+            provider="hf-inference", providerId=model_id, hf_model_id=model_id, task=self.task, status="live"
+        )
+
+    def _prepare_url(self, api_key: str, mapped_model: str) -> str:
+        # hf-inference provider can handle URLs (e.g. Inference Endpoints or TGI deployment)
+        if mapped_model.startswith(("http://", "https://")):
+            return mapped_model
+        return (
+            # Feature-extraction and sentence-similarity are the only cases where we handle models with several tasks.
+            f"{self.base_url}/models/{mapped_model}/pipeline/{self.task}"
+            if self.task in ("feature-extraction", "sentence-similarity")
+            # Otherwise, we use the default endpoint
+            else f"{self.base_url}/models/{mapped_model}"
+        )
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        if isinstance(inputs, bytes):
+            raise ValueError(f"Unexpected binary input for task {self.task}.")
+        if isinstance(inputs, Path):
+            raise ValueError(f"Unexpected path input for task {self.task} (got {inputs})")
+        return filter_none({"inputs": inputs, "parameters": parameters})
+
+
+class HFInferenceBinaryInputTask(HFInferenceTask):
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        return None
+
+    def _prepare_payload_as_bytes(
+        self,
+        inputs: Any,
+        parameters: Dict,
+        provider_mapping_info: InferenceProviderMapping,
+        extra_payload: Optional[Dict],
+    ) -> Optional[MimeBytes]:
+        parameters = filter_none(parameters)
+        extra_payload = extra_payload or {}
+        has_parameters = len(parameters) > 0 or len(extra_payload) > 0
+
+        # Raise if not a binary object or a local path or a URL.
+        if not isinstance(inputs, (bytes, Path)) and not isinstance(inputs, str):
+            raise ValueError(f"Expected binary inputs or a local path or a URL. Got {inputs}")
+
+        # Send inputs as raw content when no parameters are provided
+        if not has_parameters:
+            return _open_as_mime_bytes(inputs)
+
+        # Otherwise encode as b64
+        return MimeBytes(
+            json.dumps({"inputs": _b64_encode(inputs), "parameters": parameters, **extra_payload}).encode("utf-8"),
+            mime_type="application/json",
+        )
+
+
+class HFInferenceConversational(HFInferenceTask):
+    def __init__(self):
+        super().__init__("conversational")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        payload = filter_none(parameters)
+        mapped_model = provider_mapping_info.provider_id
+        payload_model = parameters.get("model") or mapped_model
+
+        if payload_model is None or payload_model.startswith(("http://", "https://")):
+            payload_model = "dummy"
+
+        response_format = parameters.get("response_format")
+        if isinstance(response_format, dict) and response_format.get("type") == "json_schema":
+            payload["response_format"] = {
+                "type": "json_object",
+                "value": response_format["json_schema"]["schema"],
+            }
+        return {**payload, "model": payload_model, "messages": inputs}
+
+    def _prepare_url(self, api_key: str, mapped_model: str) -> str:
+        base_url = (
+            mapped_model
+            if mapped_model.startswith(("http://", "https://"))
+            else f"{constants.INFERENCE_PROXY_TEMPLATE.format(provider='hf-inference')}/models/{mapped_model}"
+        )
+        return _build_chat_completion_url(base_url)
+
+
+def _build_chat_completion_url(model_url: str) -> str:
+    parsed = urlparse(model_url)
+    path = parsed.path.rstrip("/")
+
+    # If the path already ends with /chat/completions, we're done!
+    if path.endswith("/chat/completions"):
+        return model_url
+
+    # Append /chat/completions if not already present
+    if path.endswith("/v1"):
+        new_path = path + "/chat/completions"
+    # If path was empty or just "/", set the full path
+    elif not path:
+        new_path = "/v1/chat/completions"
+    # Append /v1/chat/completions if not already present
+    else:
+        new_path = path + "/v1/chat/completions"
+
+    # Reconstruct the URL with the new path and original query parameters.
+    new_parsed = parsed._replace(path=new_path)
+    return str(urlunparse(new_parsed))
+
+
+@lru_cache(maxsize=1)
+def _fetch_recommended_models() -> Dict[str, Optional[str]]:
+    response = get_session().get(f"{constants.ENDPOINT}/api/tasks", headers=build_hf_headers())
+    hf_raise_for_status(response)
+    return {task: next(iter(details["widgetModels"]), None) for task, details in response.json().items()}
+
+
+@lru_cache(maxsize=None)
+def _check_supported_task(model: str, task: str) -> None:
+    from huggingface_hub.hf_api import HfApi
+
+    model_info = HfApi().model_info(model)
+    pipeline_tag = model_info.pipeline_tag
+    tags = model_info.tags or []
+    is_conversational = "conversational" in tags
+    if task in ("text-generation", "conversational"):
+        if pipeline_tag == "text-generation":
+            # text-generation + conversational tag -> both tasks allowed
+            if is_conversational:
+                return
+            # text-generation without conversational tag -> only text-generation allowed
+            if task == "text-generation":
+                return
+            raise ValueError(f"Model '{model}' doesn't support task '{task}'.")
+
+    if pipeline_tag == "text2text-generation":
+        if task == "text-generation":
+            return
+        raise ValueError(f"Model '{model}' doesn't support task '{task}'.")
+
+    if pipeline_tag == "image-text-to-text":
+        if is_conversational and task == "conversational":
+            return  # Only conversational allowed if tagged as conversational
+        raise ValueError("Non-conversational image-text-to-text task is not supported.")
+
+    if (
+        task in ("feature-extraction", "sentence-similarity")
+        and pipeline_tag in ("feature-extraction", "sentence-similarity")
+        and task in tags
+    ):
+        # feature-extraction and sentence-similarity are interchangeable for HF Inference
+        return
+
+    # For all other tasks, just check pipeline tag
+    if pipeline_tag != task:
+        raise ValueError(
+            f"Model '{model}' doesn't support task '{task}'. Supported tasks: '{pipeline_tag}', got: '{task}'"
+        )
+    return
+
+
+class HFInferenceFeatureExtractionTask(HFInferenceTask):
+    def __init__(self):
+        super().__init__("feature-extraction")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        if isinstance(inputs, bytes):
+            raise ValueError(f"Unexpected binary input for task {self.task}.")
+        if isinstance(inputs, Path):
+            raise ValueError(f"Unexpected path input for task {self.task} (got {inputs})")
+
+        # Parameters are sent at root-level for feature-extraction task
+        # See specs: https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/tasks/feature-extraction/spec/input.json
+        return {"inputs": inputs, **filter_none(parameters)}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        if isinstance(response, bytes):
+            return _bytes_to_dict(response)
+        return response
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/hyperbolic.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/hyperbolic.py
new file mode 100644
index 0000000000000000000000000000000000000000..6dcb14cc275f6b80db5643361b9dfd3cbf8d91a2
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/hyperbolic.py
@@ -0,0 +1,47 @@
+import base64
+from typing import Any, Dict, Optional, Union
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import RequestParameters, _as_dict
+from huggingface_hub.inference._providers._common import BaseConversationalTask, TaskProviderHelper, filter_none
+
+
+class HyperbolicTextToImageTask(TaskProviderHelper):
+    def __init__(self):
+        super().__init__(provider="hyperbolic", base_url="https://api.hyperbolic.xyz", task="text-to-image")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/v1/images/generations"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        mapped_model = provider_mapping_info.provider_id
+        parameters = filter_none(parameters)
+        if "num_inference_steps" in parameters:
+            parameters["steps"] = parameters.pop("num_inference_steps")
+        if "guidance_scale" in parameters:
+            parameters["cfg_scale"] = parameters.pop("guidance_scale")
+        # For Hyperbolic, the width and height are required parameters
+        if "width" not in parameters:
+            parameters["width"] = 512
+        if "height" not in parameters:
+            parameters["height"] = 512
+        return {"prompt": inputs, "model_name": mapped_model, **parameters}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        response_dict = _as_dict(response)
+        return base64.b64decode(response_dict["images"][0]["image"])
+
+
+class HyperbolicTextGenerationTask(BaseConversationalTask):
+    """
+    Special case for Hyperbolic, where text-generation task is handled as a conversational task.
+    """
+
+    def __init__(self, task: str):
+        super().__init__(
+            provider="hyperbolic",
+            base_url="https://api.hyperbolic.xyz",
+        )
+        self.task = task
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/nebius.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/nebius.py
new file mode 100644
index 0000000000000000000000000000000000000000..85ad67c4c8835d7fb8bfe5f36e426614174a66ba
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/nebius.py
@@ -0,0 +1,83 @@
+import base64
+from typing import Any, Dict, Optional, Union
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import RequestParameters, _as_dict
+from huggingface_hub.inference._providers._common import (
+    BaseConversationalTask,
+    BaseTextGenerationTask,
+    TaskProviderHelper,
+    filter_none,
+)
+
+
+class NebiusTextGenerationTask(BaseTextGenerationTask):
+    def __init__(self):
+        super().__init__(provider="nebius", base_url="https://api.studio.nebius.ai")
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        output = _as_dict(response)["choices"][0]
+        return {
+            "generated_text": output["text"],
+            "details": {
+                "finish_reason": output.get("finish_reason"),
+                "seed": output.get("seed"),
+            },
+        }
+
+
+class NebiusConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider="nebius", base_url="https://api.studio.nebius.ai")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        payload = super()._prepare_payload_as_dict(inputs, parameters, provider_mapping_info)
+        response_format = parameters.get("response_format")
+        if isinstance(response_format, dict) and response_format.get("type") == "json_schema":
+            json_schema_details = response_format.get("json_schema")
+            if isinstance(json_schema_details, dict) and "schema" in json_schema_details:
+                payload["guided_json"] = json_schema_details["schema"]  # type: ignore [index]
+        return payload
+
+
+class NebiusTextToImageTask(TaskProviderHelper):
+    def __init__(self):
+        super().__init__(task="text-to-image", provider="nebius", base_url="https://api.studio.nebius.ai")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/v1/images/generations"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        mapped_model = provider_mapping_info.provider_id
+        parameters = filter_none(parameters)
+        if "guidance_scale" in parameters:
+            parameters.pop("guidance_scale")
+        if parameters.get("response_format") not in ("b64_json", "url"):
+            parameters["response_format"] = "b64_json"
+
+        return {"prompt": inputs, **parameters, "model": mapped_model}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        response_dict = _as_dict(response)
+        return base64.b64decode(response_dict["data"][0]["b64_json"])
+
+
+class NebiusFeatureExtractionTask(TaskProviderHelper):
+    def __init__(self):
+        super().__init__(task="feature-extraction", provider="nebius", base_url="https://api.studio.nebius.ai")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/v1/embeddings"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        return {"input": inputs, "model": provider_mapping_info.provider_id}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        embeddings = _as_dict(response)["data"]
+        return [embedding["embedding"] for embedding in embeddings]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/novita.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/novita.py
new file mode 100644
index 0000000000000000000000000000000000000000..44adc9017b456f487513cde251086075d84b69f0
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/novita.py
@@ -0,0 +1,69 @@
+from typing import Any, Dict, Optional, Union
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import RequestParameters, _as_dict
+from huggingface_hub.inference._providers._common import (
+    BaseConversationalTask,
+    BaseTextGenerationTask,
+    TaskProviderHelper,
+    filter_none,
+)
+from huggingface_hub.utils import get_session
+
+
+_PROVIDER = "novita"
+_BASE_URL = "https://api.novita.ai"
+
+
+class NovitaTextGenerationTask(BaseTextGenerationTask):
+    def __init__(self):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL)
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        # there is no v1/ route for novita
+        return "/v3/openai/completions"
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        output = _as_dict(response)["choices"][0]
+        return {
+            "generated_text": output["text"],
+            "details": {
+                "finish_reason": output.get("finish_reason"),
+                "seed": output.get("seed"),
+            },
+        }
+
+
+class NovitaConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL)
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        # there is no v1/ route for novita
+        return "/v3/openai/chat/completions"
+
+
+class NovitaTextToVideoTask(TaskProviderHelper):
+    def __init__(self):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL, task="text-to-video")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return f"/v3/hf/{mapped_model}"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        return {"prompt": inputs, **filter_none(parameters)}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        response_dict = _as_dict(response)
+        if not (
+            isinstance(response_dict, dict)
+            and "video" in response_dict
+            and isinstance(response_dict["video"], dict)
+            and "video_url" in response_dict["video"]
+        ):
+            raise ValueError("Expected response format: { 'video': { 'video_url': string } }")
+
+        video_url = response_dict["video"]["video_url"]
+        return get_session().get(video_url).content
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/nscale.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/nscale.py
new file mode 100644
index 0000000000000000000000000000000000000000..ce5b20e354e246e93a7dd9831e4acf69ebcfad63
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/nscale.py
@@ -0,0 +1,44 @@
+import base64
+from typing import Any, Dict, Optional, Union
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import RequestParameters, _as_dict
+
+from ._common import BaseConversationalTask, TaskProviderHelper, filter_none
+
+
+class NscaleConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider="nscale", base_url="https://inference.api.nscale.com")
+
+
+class NscaleTextToImageTask(TaskProviderHelper):
+    def __init__(self):
+        super().__init__(provider="nscale", base_url="https://inference.api.nscale.com", task="text-to-image")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/v1/images/generations"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        mapped_model = provider_mapping_info.provider_id
+        # Combine all parameters except inputs and parameters
+        parameters = filter_none(parameters)
+        if "width" in parameters and "height" in parameters:
+            parameters["size"] = f"{parameters.pop('width')}x{parameters.pop('height')}"
+        if "num_inference_steps" in parameters:
+            parameters.pop("num_inference_steps")
+        if "cfg_scale" in parameters:
+            parameters.pop("cfg_scale")
+        payload = {
+            "response_format": "b64_json",
+            "prompt": inputs,
+            "model": mapped_model,
+            **parameters,
+        }
+        return payload
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        response_dict = _as_dict(response)
+        return base64.b64decode(response_dict["data"][0]["b64_json"])
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/openai.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/openai.py
new file mode 100644
index 0000000000000000000000000000000000000000..7a554093c173ea8f664cb7fbd9616ce3a08ce78c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/openai.py
@@ -0,0 +1,25 @@
+from typing import Optional
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._providers._common import BaseConversationalTask
+
+
+class OpenAIConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider="openai", base_url="https://api.openai.com")
+
+    def _prepare_api_key(self, api_key: Optional[str]) -> str:
+        if api_key is None:
+            raise ValueError("You must provide an api_key to work with OpenAI API.")
+        if api_key.startswith("hf_"):
+            raise ValueError(
+                "OpenAI provider is not available through Hugging Face routing, please use your own OpenAI API key."
+            )
+        return api_key
+
+    def _prepare_mapping_info(self, model: Optional[str]) -> InferenceProviderMapping:
+        if model is None:
+            raise ValueError("Please provide an OpenAI model ID, e.g. `gpt-4o` or `o1`.")
+        return InferenceProviderMapping(
+            provider="openai", providerId=model, task="conversational", status="live", hf_model_id=model
+        )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/publicai.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/publicai.py
new file mode 100644
index 0000000000000000000000000000000000000000..4c88528e4f1e2eefaf6be9315c490db19ff5ca1e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/publicai.py
@@ -0,0 +1,6 @@
+from ._common import BaseConversationalTask
+
+
+class PublicAIConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider="publicai", base_url="https://api.publicai.co")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/replicate.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/replicate.py
new file mode 100644
index 0000000000000000000000000000000000000000..139582cc801eaf0bdd93e006df404432f2375fb3
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/replicate.py
@@ -0,0 +1,90 @@
+from typing import Any, Dict, Optional, Union
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import RequestParameters, _as_dict, _as_url
+from huggingface_hub.inference._providers._common import TaskProviderHelper, filter_none
+from huggingface_hub.utils import get_session
+
+
+_PROVIDER = "replicate"
+_BASE_URL = "https://api.replicate.com"
+
+
+class ReplicateTask(TaskProviderHelper):
+    def __init__(self, task: str):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL, task=task)
+
+    def _prepare_headers(self, headers: Dict, api_key: str) -> Dict[str, Any]:
+        headers = super()._prepare_headers(headers, api_key)
+        headers["Prefer"] = "wait"
+        return headers
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        if ":" in mapped_model:
+            return "/v1/predictions"
+        return f"/v1/models/{mapped_model}/predictions"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        mapped_model = provider_mapping_info.provider_id
+        payload: Dict[str, Any] = {"input": {"prompt": inputs, **filter_none(parameters)}}
+        if ":" in mapped_model:
+            version = mapped_model.split(":", 1)[1]
+            payload["version"] = version
+        return payload
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        response_dict = _as_dict(response)
+        if response_dict.get("output") is None:
+            raise TimeoutError(
+                f"Inference request timed out after 60 seconds. No output generated for model {response_dict.get('model')}"
+                "The model might be in cold state or starting up. Please try again later."
+            )
+        output_url = (
+            response_dict["output"] if isinstance(response_dict["output"], str) else response_dict["output"][0]
+        )
+        return get_session().get(output_url).content
+
+
+class ReplicateTextToImageTask(ReplicateTask):
+    def __init__(self):
+        super().__init__("text-to-image")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        payload: Dict = super()._prepare_payload_as_dict(inputs, parameters, provider_mapping_info)  # type: ignore[assignment]
+        if provider_mapping_info.adapter_weights_path is not None:
+            payload["input"]["lora_weights"] = f"https://huggingface.co/{provider_mapping_info.hf_model_id}"
+        return payload
+
+
+class ReplicateTextToSpeechTask(ReplicateTask):
+    def __init__(self):
+        super().__init__("text-to-speech")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        payload: Dict = super()._prepare_payload_as_dict(inputs, parameters, provider_mapping_info)  # type: ignore[assignment]
+        payload["input"]["text"] = payload["input"].pop("prompt")  # rename "prompt" to "text" for TTS
+        return payload
+
+
+class ReplicateImageToImageTask(ReplicateTask):
+    def __init__(self):
+        super().__init__("image-to-image")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        image_url = _as_url(inputs, default_mime_type="image/jpeg")
+
+        payload: Dict[str, Any] = {"input": {"input_image": image_url, **filter_none(parameters)}}
+
+        mapped_model = provider_mapping_info.provider_id
+        if ":" in mapped_model:
+            version = mapped_model.split(":", 1)[1]
+            payload["version"] = version
+        return payload
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/sambanova.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/sambanova.py
new file mode 100644
index 0000000000000000000000000000000000000000..ed96fb766ce49003b605bda8ef8ee34da0ebe2f4
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/sambanova.py
@@ -0,0 +1,42 @@
+from typing import Any, Dict, Optional, Union
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import RequestParameters, _as_dict
+from huggingface_hub.inference._providers._common import BaseConversationalTask, TaskProviderHelper, filter_none
+
+
+class SambanovaConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider="sambanova", base_url="https://api.sambanova.ai")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        response_format_config = parameters.get("response_format")
+        if isinstance(response_format_config, dict):
+            if response_format_config.get("type") == "json_schema":
+                json_schema_config = response_format_config.get("json_schema", {})
+                strict = json_schema_config.get("strict")
+                if isinstance(json_schema_config, dict) and (strict is True or strict is None):
+                    json_schema_config["strict"] = False
+
+        payload = super()._prepare_payload_as_dict(inputs, parameters, provider_mapping_info)
+        return payload
+
+
+class SambanovaFeatureExtractionTask(TaskProviderHelper):
+    def __init__(self):
+        super().__init__(provider="sambanova", base_url="https://api.sambanova.ai", task="feature-extraction")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/v1/embeddings"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        parameters = filter_none(parameters)
+        return {"input": inputs, "model": provider_mapping_info.provider_id, **parameters}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        embeddings = _as_dict(response)["data"]
+        return [embedding["embedding"] for embedding in embeddings]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/scaleway.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/scaleway.py
new file mode 100644
index 0000000000000000000000000000000000000000..cfdd75416f1a11f3f4908d1c29541920cba76d79
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/scaleway.py
@@ -0,0 +1,28 @@
+from typing import Any, Dict, Optional, Union
+
+from huggingface_hub.inference._common import RequestParameters, _as_dict
+
+from ._common import BaseConversationalTask, InferenceProviderMapping, TaskProviderHelper, filter_none
+
+
+class ScalewayConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider="scaleway", base_url="https://api.scaleway.ai")
+
+
+class ScalewayFeatureExtractionTask(TaskProviderHelper):
+    def __init__(self):
+        super().__init__(provider="scaleway", base_url="https://api.scaleway.ai", task="feature-extraction")
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/v1/embeddings"
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        parameters = filter_none(parameters)
+        return {"input": inputs, "model": provider_mapping_info.provider_id, **parameters}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        embeddings = _as_dict(response)["data"]
+        return [embedding["embedding"] for embedding in embeddings]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/together.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/together.py
new file mode 100644
index 0000000000000000000000000000000000000000..de166b7baf8d50b255f29cf8cc9b9d3fa639646e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/together.py
@@ -0,0 +1,88 @@
+import base64
+from abc import ABC
+from typing import Any, Dict, Optional, Union
+
+from huggingface_hub.hf_api import InferenceProviderMapping
+from huggingface_hub.inference._common import RequestParameters, _as_dict
+from huggingface_hub.inference._providers._common import (
+    BaseConversationalTask,
+    BaseTextGenerationTask,
+    TaskProviderHelper,
+    filter_none,
+)
+
+
+_PROVIDER = "together"
+_BASE_URL = "https://api.together.xyz"
+
+
+class TogetherTask(TaskProviderHelper, ABC):
+    """Base class for Together API tasks."""
+
+    def __init__(self, task: str):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL, task=task)
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        if self.task == "text-to-image":
+            return "/v1/images/generations"
+        elif self.task == "conversational":
+            return "/v1/chat/completions"
+        elif self.task == "text-generation":
+            return "/v1/completions"
+        raise ValueError(f"Unsupported task '{self.task}' for Together API.")
+
+
+class TogetherTextGenerationTask(BaseTextGenerationTask):
+    def __init__(self):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL)
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        output = _as_dict(response)["choices"][0]
+        return {
+            "generated_text": output["text"],
+            "details": {
+                "finish_reason": output.get("finish_reason"),
+                "seed": output.get("seed"),
+            },
+        }
+
+
+class TogetherConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider=_PROVIDER, base_url=_BASE_URL)
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        payload = super()._prepare_payload_as_dict(inputs, parameters, provider_mapping_info)
+        response_format = parameters.get("response_format")
+        if isinstance(response_format, dict) and response_format.get("type") == "json_schema":
+            json_schema_details = response_format.get("json_schema")
+            if isinstance(json_schema_details, dict) and "schema" in json_schema_details:
+                payload["response_format"] = {  # type: ignore [index]
+                    "type": "json_object",
+                    "schema": json_schema_details["schema"],
+                }
+
+        return payload
+
+
+class TogetherTextToImageTask(TogetherTask):
+    def __init__(self):
+        super().__init__("text-to-image")
+
+    def _prepare_payload_as_dict(
+        self, inputs: Any, parameters: Dict, provider_mapping_info: InferenceProviderMapping
+    ) -> Optional[Dict]:
+        mapped_model = provider_mapping_info.provider_id
+        parameters = filter_none(parameters)
+        if "num_inference_steps" in parameters:
+            parameters["steps"] = parameters.pop("num_inference_steps")
+        if "guidance_scale" in parameters:
+            parameters["guidance"] = parameters.pop("guidance_scale")
+
+        return {"prompt": inputs, "response_format": "base64", **parameters, "model": mapped_model}
+
+    def get_response(self, response: Union[bytes, Dict], request_params: Optional[RequestParameters] = None) -> Any:
+        response_dict = _as_dict(response)
+        return base64.b64decode(response_dict["data"][0]["b64_json"])
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/zai_org.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/zai_org.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6f4c42b5abc78a98474b2f8899d6b30a4a58f8d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/inference/_providers/zai_org.py
@@ -0,0 +1,17 @@
+from typing import Any, Dict
+
+from huggingface_hub.inference._providers._common import BaseConversationalTask
+
+
+class ZaiConversationalTask(BaseConversationalTask):
+    def __init__(self):
+        super().__init__(provider="zai-org", base_url="https://api.z.ai")
+
+    def _prepare_headers(self, headers: Dict, api_key: str) -> Dict[str, Any]:
+        headers = super()._prepare_headers(headers, api_key)
+        headers["Accept-Language"] = "en-US,en"
+        headers["x-source-channel"] = "hugging_face"
+        return headers
+
+    def _prepare_route(self, mapped_model: str, api_key: str) -> str:
+        return "/api/paas/v4/chat/completions"
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..8949a22a5f65ab29b7df65aa6a9df9bce0544b7e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__init__.py
@@ -0,0 +1,27 @@
+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ruff: noqa: F401
+"""Contains helpers to serialize tensors."""
+
+from ._base import StateDictSplit, split_state_dict_into_shards_factory
+from ._tensorflow import get_tf_storage_size, split_tf_state_dict_into_shards
+from ._torch import (
+    get_torch_storage_id,
+    get_torch_storage_size,
+    load_state_dict_from_file,
+    load_torch_model,
+    save_torch_model,
+    save_torch_state_dict,
+    split_torch_state_dict_into_shards,
+)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..8bccb6a4e47025f5b54994ace458adfcea4899b0
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_base.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_base.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..61aa180abea782ca123f4bb23320e7c4872a5aa9
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_base.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_dduf.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_dduf.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..bc12f1741970db83198542b9184717e494a7ec13
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_dduf.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_tensorflow.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_tensorflow.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6ff33fdafb1bb2ac4d40c0a1aead746517981199
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_tensorflow.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_torch.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_torch.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..d467c1c85ae68b00afee8dffb7c5bf998e6b4234
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/__pycache__/_torch.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_base.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_base.py
new file mode 100644
index 0000000000000000000000000000000000000000..b79c82f5dba58d252b5c3a7345f0df09794b55ce
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_base.py
@@ -0,0 +1,207 @@
+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains helpers to split tensors into shards."""
+
+from dataclasses import dataclass, field
+from typing import Any, Callable, Dict, List, Optional, TypeVar, Union
+
+from .. import logging
+
+
+TensorT = TypeVar("TensorT")
+TensorSizeFn_T = Callable[[TensorT], int]
+StorageIDFn_T = Callable[[TensorT], Optional[Any]]
+
+MAX_SHARD_SIZE = "5GB"
+SIZE_UNITS = {
+    "TB": 10**12,
+    "GB": 10**9,
+    "MB": 10**6,
+    "KB": 10**3,
+}
+
+
+logger = logging.get_logger(__file__)
+
+
+@dataclass
+class StateDictSplit:
+    is_sharded: bool = field(init=False)
+    metadata: Dict[str, Any]
+    filename_to_tensors: Dict[str, List[str]]
+    tensor_to_filename: Dict[str, str]
+
+    def __post_init__(self):
+        self.is_sharded = len(self.filename_to_tensors) > 1
+
+
+def split_state_dict_into_shards_factory(
+    state_dict: Dict[str, TensorT],
+    *,
+    get_storage_size: TensorSizeFn_T,
+    filename_pattern: str,
+    get_storage_id: StorageIDFn_T = lambda tensor: None,
+    max_shard_size: Union[int, str] = MAX_SHARD_SIZE,
+) -> StateDictSplit:
+    """
+    Split a model state dictionary in shards so that each shard is smaller than a given size.
+
+    The shards are determined by iterating through the `state_dict` in the order of its keys. There is no optimization
+    made to make each shard as close as possible to the maximum size passed. For example, if the limit is 10GB and we
+    have tensors of sizes [6GB, 6GB, 2GB, 6GB, 2GB, 2GB] they will get sharded as [6GB], [6+2GB], [6+2+2GB] and not
+    [6+2+2GB], [6+2GB], [6GB].
+
+    > [!WARNING]
+    > If one of the model's tensor is bigger than `max_shard_size`, it will end up in its own shard which will have a
+    > size greater than `max_shard_size`.
+
+    Args:
+        state_dict (`Dict[str, Tensor]`):
+            The state dictionary to save.
+        get_storage_size (`Callable[[Tensor], int]`):
+            A function that returns the size of a tensor when saved on disk in bytes.
+        get_storage_id (`Callable[[Tensor], Optional[Any]]`, *optional*):
+            A function that returns a unique identifier to a tensor storage. Multiple different tensors can share the
+            same underlying storage. This identifier is guaranteed to be unique and constant for this tensor's storage
+            during its lifetime. Two tensor storages with non-overlapping lifetimes may have the same id.
+        filename_pattern (`str`, *optional*):
+            The pattern to generate the files names in which the model will be saved. Pattern must be a string that
+            can be formatted with `filename_pattern.format(suffix=...)` and must contain the keyword `suffix`
+        max_shard_size (`int` or `str`, *optional*):
+            The maximum size of each shard, in bytes. Defaults to 5GB.
+
+    Returns:
+        [`StateDictSplit`]: A `StateDictSplit` object containing the shards and the index to retrieve them.
+    """
+    storage_id_to_tensors: Dict[Any, List[str]] = {}
+
+    shard_list: List[Dict[str, TensorT]] = []
+    current_shard: Dict[str, TensorT] = {}
+    current_shard_size = 0
+    total_size = 0
+
+    if isinstance(max_shard_size, str):
+        max_shard_size = parse_size_to_int(max_shard_size)
+
+    for key, tensor in state_dict.items():
+        # when bnb serialization is used the weights in the state dict can be strings
+        # check: https://github.com/huggingface/transformers/pull/24416 for more details
+        if isinstance(tensor, str):
+            logger.info("Skipping tensor %s as it is a string (bnb serialization)", key)
+            continue
+
+        # If a `tensor` shares the same underlying storage as another tensor, we put `tensor` in the same `block`
+        storage_id = get_storage_id(tensor)
+        if storage_id is not None:
+            if storage_id in storage_id_to_tensors:
+                # We skip this tensor for now and will reassign to correct shard later
+                storage_id_to_tensors[storage_id].append(key)
+                continue
+            else:
+                # This is the first tensor with this storage_id, we create a new entry
+                # in the storage_id_to_tensors dict => we will assign the shard id later
+                storage_id_to_tensors[storage_id] = [key]
+
+        # Compute tensor size
+        tensor_size = get_storage_size(tensor)
+
+        # If this tensor is bigger than the maximal size, we put it in its own shard
+        if tensor_size > max_shard_size:
+            total_size += tensor_size
+            shard_list.append({key: tensor})
+            continue
+
+        # If this tensor is going to tip up over the maximal size, we split.
+        # Current shard already has some tensors, we add it to the list of shards and create a new one.
+        if current_shard_size + tensor_size > max_shard_size:
+            shard_list.append(current_shard)
+            current_shard = {}
+            current_shard_size = 0
+
+        # Add the tensor to the current shard
+        current_shard[key] = tensor
+        current_shard_size += tensor_size
+        total_size += tensor_size
+
+    # Add the last shard
+    if len(current_shard) > 0:
+        shard_list.append(current_shard)
+    nb_shards = len(shard_list)
+
+    # Loop over the tensors that share the same storage and assign them together
+    for storage_id, keys in storage_id_to_tensors.items():
+        # Let's try to find the shard where the first tensor of this storage is and put all tensors in the same shard
+        for shard in shard_list:
+            if keys[0] in shard:
+                for key in keys:
+                    shard[key] = state_dict[key]
+                break
+
+    # If we only have one shard, we return it => no need to build the index
+    if nb_shards == 1:
+        filename = filename_pattern.format(suffix="")
+        return StateDictSplit(
+            metadata={"total_size": total_size},
+            filename_to_tensors={filename: list(state_dict.keys())},
+            tensor_to_filename={key: filename for key in state_dict.keys()},
+        )
+
+    # Now that each tensor is assigned to a shard, let's assign a filename to each shard
+    tensor_name_to_filename = {}
+    filename_to_tensors = {}
+    for idx, shard in enumerate(shard_list):
+        filename = filename_pattern.format(suffix=f"-{idx + 1:05d}-of-{nb_shards:05d}")
+        for key in shard:
+            tensor_name_to_filename[key] = filename
+        filename_to_tensors[filename] = list(shard.keys())
+
+    # Build the index and return
+    return StateDictSplit(
+        metadata={"total_size": total_size},
+        filename_to_tensors=filename_to_tensors,
+        tensor_to_filename=tensor_name_to_filename,
+    )
+
+
+def parse_size_to_int(size_as_str: str) -> int:
+    """
+    Parse a size expressed as a string with digits and unit (like `"5MB"`) to an integer (in bytes).
+
+    Supported units are "TB", "GB", "MB", "KB".
+
+    Args:
+        size_as_str (`str`): The size to convert. Will be directly returned if an `int`.
+
+    Example:
+
+    ```py
+    >>> parse_size_to_int("5MB")
+    5000000
+    ```
+    """
+    size_as_str = size_as_str.strip()
+
+    # Parse unit
+    unit = size_as_str[-2:].upper()
+    if unit not in SIZE_UNITS:
+        raise ValueError(f"Unit '{unit}' not supported. Supported units are TB, GB, MB, KB. Got '{size_as_str}'.")
+    multiplier = SIZE_UNITS[unit]
+
+    # Parse value
+    try:
+        value = float(size_as_str[:-2].strip())
+    except ValueError as e:
+        raise ValueError(f"Could not parse the size value from '{size_as_str}': {e}") from e
+
+    return int(value * multiplier)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_dduf.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_dduf.py
new file mode 100644
index 0000000000000000000000000000000000000000..a1debadb3ac8a45716f0359b932dc065f09edb84
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_dduf.py
@@ -0,0 +1,387 @@
+import json
+import logging
+import mmap
+import os
+import shutil
+import zipfile
+from contextlib import contextmanager
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any, Dict, Generator, Iterable, Tuple, Union
+
+from ..errors import DDUFCorruptedFileError, DDUFExportError, DDUFInvalidEntryNameError
+
+
+logger = logging.getLogger(__name__)
+
+DDUF_ALLOWED_ENTRIES = {
+    # Allowed file extensions in a DDUF file
+    ".json",
+    ".model",
+    ".safetensors",
+    ".txt",
+}
+
+DDUF_FOLDER_REQUIRED_ENTRIES = {
+    # Each folder must contain at least one of these entries
+    "config.json",
+    "tokenizer_config.json",
+    "preprocessor_config.json",
+    "scheduler_config.json",
+}
+
+
+@dataclass
+class DDUFEntry:
+    """Object representing a file entry in a DDUF file.
+
+    See [`read_dduf_file`] for how to read a DDUF file.
+
+    Attributes:
+        filename (str):
+            The name of the file in the DDUF archive.
+        offset (int):
+            The offset of the file in the DDUF archive.
+        length (int):
+            The length of the file in the DDUF archive.
+        dduf_path (str):
+            The path to the DDUF archive (for internal use).
+    """
+
+    filename: str
+    length: int
+    offset: int
+
+    dduf_path: Path = field(repr=False)
+
+    @contextmanager
+    def as_mmap(self) -> Generator[bytes, None, None]:
+        """Open the file as a memory-mapped file.
+
+        Useful to load safetensors directly from the file.
+
+        Example:
+            ```py
+            >>> import safetensors.torch
+            >>> with entry.as_mmap() as mm:
+            ...     tensors = safetensors.torch.load(mm)
+            ```
+        """
+        with self.dduf_path.open("rb") as f:
+            with mmap.mmap(f.fileno(), length=0, access=mmap.ACCESS_READ) as mm:
+                yield mm[self.offset : self.offset + self.length]
+
+    def read_text(self, encoding: str = "utf-8") -> str:
+        """Read the file as text.
+
+        Useful for '.txt' and '.json' entries.
+
+        Example:
+            ```py
+            >>> import json
+            >>> index = json.loads(entry.read_text())
+            ```
+        """
+        with self.dduf_path.open("rb") as f:
+            f.seek(self.offset)
+            return f.read(self.length).decode(encoding=encoding)
+
+
+def read_dduf_file(dduf_path: Union[os.PathLike, str]) -> Dict[str, DDUFEntry]:
+    """
+    Read a DDUF file and return a dictionary of entries.
+
+    Only the metadata is read, the data is not loaded in memory.
+
+    Args:
+        dduf_path (`str` or `os.PathLike`):
+            The path to the DDUF file to read.
+
+    Returns:
+        `Dict[str, DDUFEntry]`:
+            A dictionary of [`DDUFEntry`] indexed by filename.
+
+    Raises:
+        - [`DDUFCorruptedFileError`]: If the DDUF file is corrupted (i.e. doesn't follow the DDUF format).
+
+    Example:
+        ```python
+        >>> import json
+        >>> import safetensors.torch
+        >>> from huggingface_hub import read_dduf_file
+
+        # Read DDUF metadata
+        >>> dduf_entries = read_dduf_file("FLUX.1-dev.dduf")
+
+        # Returns a mapping filename <> DDUFEntry
+        >>> dduf_entries["model_index.json"]
+        DDUFEntry(filename='model_index.json', offset=66, length=587)
+
+        # Load model index as JSON
+        >>> json.loads(dduf_entries["model_index.json"].read_text())
+        {'_class_name': 'FluxPipeline', '_diffusers_version': '0.32.0.dev0', '_name_or_path': 'black-forest-labs/FLUX.1-dev', ...
+
+        # Load VAE weights using safetensors
+        >>> with dduf_entries["vae/diffusion_pytorch_model.safetensors"].as_mmap() as mm:
+        ...     state_dict = safetensors.torch.load(mm)
+        ```
+    """
+    entries = {}
+    dduf_path = Path(dduf_path)
+    logger.info(f"Reading DDUF file {dduf_path}")
+    with zipfile.ZipFile(str(dduf_path), "r") as zf:
+        for info in zf.infolist():
+            logger.debug(f"Reading entry {info.filename}")
+            if info.compress_type != zipfile.ZIP_STORED:
+                raise DDUFCorruptedFileError("Data must not be compressed in DDUF file.")
+
+            try:
+                _validate_dduf_entry_name(info.filename)
+            except DDUFInvalidEntryNameError as e:
+                raise DDUFCorruptedFileError(f"Invalid entry name in DDUF file: {info.filename}") from e
+
+            offset = _get_data_offset(zf, info)
+
+            entries[info.filename] = DDUFEntry(
+                filename=info.filename, offset=offset, length=info.file_size, dduf_path=dduf_path
+            )
+
+    # Consistency checks on the DDUF file
+    if "model_index.json" not in entries:
+        raise DDUFCorruptedFileError("Missing required 'model_index.json' entry in DDUF file.")
+    index = json.loads(entries["model_index.json"].read_text())
+    _validate_dduf_structure(index, entries.keys())
+
+    logger.info(f"Done reading DDUF file {dduf_path}. Found {len(entries)} entries")
+    return entries
+
+
+def export_entries_as_dduf(
+    dduf_path: Union[str, os.PathLike], entries: Iterable[Tuple[str, Union[str, Path, bytes]]]
+) -> None:
+    """Write a DDUF file from an iterable of entries.
+
+    This is a lower-level helper than [`export_folder_as_dduf`] that allows more flexibility when serializing data.
+    In particular, you don't need to save the data on disk before exporting it in the DDUF file.
+
+    Args:
+        dduf_path (`str` or `os.PathLike`):
+            The path to the DDUF file to write.
+        entries (`Iterable[Tuple[str, Union[str, Path, bytes]]]`):
+            An iterable of entries to write in the DDUF file. Each entry is a tuple with the filename and the content.
+            The filename should be the path to the file in the DDUF archive.
+            The content can be a string or a pathlib.Path representing a path to a file on the local disk or directly the content as bytes.
+
+    Raises:
+        - [`DDUFExportError`]: If anything goes wrong during the export (e.g. invalid entry name, missing 'model_index.json', etc.).
+
+    Example:
+        ```python
+        # Export specific files from the local disk.
+        >>> from huggingface_hub import export_entries_as_dduf
+        >>> export_entries_as_dduf(
+        ...     dduf_path="stable-diffusion-v1-4-FP16.dduf",
+        ...     entries=[ # List entries to add to the DDUF file (here, only FP16 weights)
+        ...         ("model_index.json", "path/to/model_index.json"),
+        ...         ("vae/config.json", "path/to/vae/config.json"),
+        ...         ("vae/diffusion_pytorch_model.fp16.safetensors", "path/to/vae/diffusion_pytorch_model.fp16.safetensors"),
+        ...         ("text_encoder/config.json", "path/to/text_encoder/config.json"),
+        ...         ("text_encoder/model.fp16.safetensors", "path/to/text_encoder/model.fp16.safetensors"),
+        ...         # ... add more entries here
+        ...     ]
+        ... )
+        ```
+
+        ```python
+        # Export state_dicts one by one from a loaded pipeline
+        >>> from diffusers import DiffusionPipeline
+        >>> from typing import Generator, Tuple
+        >>> import safetensors.torch
+        >>> from huggingface_hub import export_entries_as_dduf
+        >>> pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
+        ... # ... do some work with the pipeline
+
+        >>> def as_entries(pipe: DiffusionPipeline) -> Generator[Tuple[str, bytes], None, None]:
+        ...     # Build an generator that yields the entries to add to the DDUF file.
+        ...     # The first element of the tuple is the filename in the DDUF archive (must use UNIX separator!). The second element is the content of the file.
+        ...     # Entries will be evaluated lazily when the DDUF file is created (only 1 entry is loaded in memory at a time)
+        ...     yield "vae/config.json", pipe.vae.to_json_string().encode()
+        ...     yield "vae/diffusion_pytorch_model.safetensors", safetensors.torch.save(pipe.vae.state_dict())
+        ...     yield "text_encoder/config.json", pipe.text_encoder.config.to_json_string().encode()
+        ...     yield "text_encoder/model.safetensors", safetensors.torch.save(pipe.text_encoder.state_dict())
+        ...     # ... add more entries here
+
+        >>> export_entries_as_dduf(dduf_path="stable-diffusion-v1-4.dduf", entries=as_entries(pipe))
+        ```
+    """
+    logger.info(f"Exporting DDUF file '{dduf_path}'")
+    filenames = set()
+    index = None
+    with zipfile.ZipFile(str(dduf_path), "w", zipfile.ZIP_STORED) as archive:
+        for filename, content in entries:
+            if filename in filenames:
+                raise DDUFExportError(f"Can't add duplicate entry: {filename}")
+            filenames.add(filename)
+
+            if filename == "model_index.json":
+                try:
+                    index = json.loads(_load_content(content).decode())
+                except json.JSONDecodeError as e:
+                    raise DDUFExportError("Failed to parse 'model_index.json'.") from e
+
+            try:
+                filename = _validate_dduf_entry_name(filename)
+            except DDUFInvalidEntryNameError as e:
+                raise DDUFExportError(f"Invalid entry name: {filename}") from e
+            logger.debug(f"Adding entry '{filename}' to DDUF file")
+            _dump_content_in_archive(archive, filename, content)
+
+    # Consistency checks on the DDUF file
+    if index is None:
+        raise DDUFExportError("Missing required 'model_index.json' entry in DDUF file.")
+    try:
+        _validate_dduf_structure(index, filenames)
+    except DDUFCorruptedFileError as e:
+        raise DDUFExportError("Invalid DDUF file structure.") from e
+
+    logger.info(f"Done writing DDUF file {dduf_path}")
+
+
+def export_folder_as_dduf(dduf_path: Union[str, os.PathLike], folder_path: Union[str, os.PathLike]) -> None:
+    """
+    Export a folder as a DDUF file.
+
+    AUses [`export_entries_as_dduf`] under the hood.
+
+    Args:
+        dduf_path (`str` or `os.PathLike`):
+            The path to the DDUF file to write.
+        folder_path (`str` or `os.PathLike`):
+            The path to the folder containing the diffusion model.
+
+    Example:
+        ```python
+        >>> from huggingface_hub import export_folder_as_dduf
+        >>> export_folder_as_dduf(dduf_path="FLUX.1-dev.dduf", folder_path="path/to/FLUX.1-dev")
+        ```
+    """
+    folder_path = Path(folder_path)
+
+    def _iterate_over_folder() -> Iterable[Tuple[str, Path]]:
+        for path in Path(folder_path).glob("**/*"):
+            if not path.is_file():
+                continue
+            if path.suffix not in DDUF_ALLOWED_ENTRIES:
+                logger.debug(f"Skipping file '{path}' (file type not allowed)")
+                continue
+            path_in_archive = path.relative_to(folder_path)
+            if len(path_in_archive.parts) >= 3:
+                logger.debug(f"Skipping file '{path}' (nested directories not allowed)")
+                continue
+            yield path_in_archive.as_posix(), path
+
+    export_entries_as_dduf(dduf_path, _iterate_over_folder())
+
+
+def _dump_content_in_archive(archive: zipfile.ZipFile, filename: str, content: Union[str, os.PathLike, bytes]) -> None:
+    with archive.open(filename, "w", force_zip64=True) as archive_fh:
+        if isinstance(content, (str, Path)):
+            content_path = Path(content)
+            with content_path.open("rb") as content_fh:
+                shutil.copyfileobj(content_fh, archive_fh, 1024 * 1024 * 8)  # type: ignore[misc]
+        elif isinstance(content, bytes):
+            archive_fh.write(content)
+        else:
+            raise DDUFExportError(f"Invalid content type for {filename}. Must be str, Path or bytes.")
+
+
+def _load_content(content: Union[str, Path, bytes]) -> bytes:
+    """Load the content of an entry as bytes.
+
+    Used only for small checks (not to dump content into archive).
+    """
+    if isinstance(content, (str, Path)):
+        return Path(content).read_bytes()
+    elif isinstance(content, bytes):
+        return content
+    else:
+        raise DDUFExportError(f"Invalid content type. Must be str, Path or bytes. Got {type(content)}.")
+
+
+def _validate_dduf_entry_name(entry_name: str) -> str:
+    if "." + entry_name.split(".")[-1] not in DDUF_ALLOWED_ENTRIES:
+        raise DDUFInvalidEntryNameError(f"File type not allowed: {entry_name}")
+    if "\\" in entry_name:
+        raise DDUFInvalidEntryNameError(f"Entry names must use UNIX separators ('/'). Got {entry_name}.")
+    entry_name = entry_name.strip("/")
+    if entry_name.count("/") > 1:
+        raise DDUFInvalidEntryNameError(f"DDUF only supports 1 level of directory. Got {entry_name}.")
+    return entry_name
+
+
+def _validate_dduf_structure(index: Any, entry_names: Iterable[str]) -> None:
+    """
+    Consistency checks on the DDUF file structure.
+
+    Rules:
+    - The 'model_index.json' entry is required and must contain a dictionary.
+    - Each folder name must correspond to an entry in 'model_index.json'.
+    - Each folder must contain at least a config file ('config.json', 'tokenizer_config.json', 'preprocessor_config.json', 'scheduler_config.json').
+
+    Args:
+        index (Any):
+            The content of the 'model_index.json' entry.
+        entry_names (Iterable[str]):
+            The list of entry names in the DDUF file.
+
+    Raises:
+        - [`DDUFCorruptedFileError`]: If the DDUF file is corrupted (i.e. doesn't follow the DDUF format).
+    """
+    if not isinstance(index, dict):
+        raise DDUFCorruptedFileError(f"Invalid 'model_index.json' content. Must be a dictionary. Got {type(index)}.")
+
+    dduf_folders = {entry.split("/")[0] for entry in entry_names if "/" in entry}
+    for folder in dduf_folders:
+        if folder not in index:
+            raise DDUFCorruptedFileError(f"Missing required entry '{folder}' in 'model_index.json'.")
+        if not any(f"{folder}/{required_entry}" in entry_names for required_entry in DDUF_FOLDER_REQUIRED_ENTRIES):
+            raise DDUFCorruptedFileError(
+                f"Missing required file in folder '{folder}'. Must contains at least one of {DDUF_FOLDER_REQUIRED_ENTRIES}."
+            )
+
+
+def _get_data_offset(zf: zipfile.ZipFile, info: zipfile.ZipInfo) -> int:
+    """
+    Calculate the data offset for a file in a ZIP archive.
+
+    Args:
+        zf (`zipfile.ZipFile`):
+            The opened ZIP file. Must be opened in read mode.
+        info (`zipfile.ZipInfo`):
+            The file info.
+
+    Returns:
+        int: The offset of the file data in the ZIP archive.
+    """
+    if zf.fp is None:
+        raise DDUFCorruptedFileError("ZipFile object must be opened in read mode.")
+
+    # Step 1: Get the local file header offset
+    header_offset = info.header_offset
+
+    # Step 2: Read the local file header
+    zf.fp.seek(header_offset)
+    local_file_header = zf.fp.read(30)  # Fixed-size part of the local header
+
+    if len(local_file_header) < 30:
+        raise DDUFCorruptedFileError("Incomplete local file header.")
+
+    # Step 3: Parse the header fields to calculate the start of file data
+    # Local file header: https://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers
+    filename_len = int.from_bytes(local_file_header[26:28], "little")
+    extra_field_len = int.from_bytes(local_file_header[28:30], "little")
+
+    # Data offset is after the fixed header, filename, and extra fields
+    data_offset = header_offset + 30 + filename_len + extra_field_len
+
+    return data_offset
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_tensorflow.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_tensorflow.py
new file mode 100644
index 0000000000000000000000000000000000000000..1173e34a28b2d7f9d879e01ffdae8ce09e9d5b5c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_tensorflow.py
@@ -0,0 +1,92 @@
+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains tensorflow-specific helpers."""
+
+import math
+import re
+from typing import TYPE_CHECKING, Dict, Union
+
+from .. import constants
+from ._base import MAX_SHARD_SIZE, StateDictSplit, split_state_dict_into_shards_factory
+
+
+if TYPE_CHECKING:
+    import tensorflow as tf
+
+
+def split_tf_state_dict_into_shards(
+    state_dict: Dict[str, "tf.Tensor"],
+    *,
+    filename_pattern: str = constants.TF2_WEIGHTS_FILE_PATTERN,
+    max_shard_size: Union[int, str] = MAX_SHARD_SIZE,
+) -> StateDictSplit:
+    """
+    Split a model state dictionary in shards so that each shard is smaller than a given size.
+
+    The shards are determined by iterating through the `state_dict` in the order of its keys. There is no optimization
+    made to make each shard as close as possible to the maximum size passed. For example, if the limit is 10GB and we
+    have tensors of sizes [6GB, 6GB, 2GB, 6GB, 2GB, 2GB] they will get sharded as [6GB], [6+2GB], [6+2+2GB] and not
+    [6+2+2GB], [6+2GB], [6GB].
+
+    > [!WARNING]
+    > If one of the model's tensor is bigger than `max_shard_size`, it will end up in its own shard which will have a
+    > size greater than `max_shard_size`.
+
+    Args:
+        state_dict (`Dict[str, Tensor]`):
+            The state dictionary to save.
+        filename_pattern (`str`, *optional*):
+            The pattern to generate the files names in which the model will be saved. Pattern must be a string that
+            can be formatted with `filename_pattern.format(suffix=...)` and must contain the keyword `suffix`
+            Defaults to `"tf_model{suffix}.h5"`.
+        max_shard_size (`int` or `str`, *optional*):
+            The maximum size of each shard, in bytes. Defaults to 5GB.
+
+    Returns:
+        [`StateDictSplit`]: A `StateDictSplit` object containing the shards and the index to retrieve them.
+    """
+    return split_state_dict_into_shards_factory(
+        state_dict,
+        max_shard_size=max_shard_size,
+        filename_pattern=filename_pattern,
+        get_storage_size=get_tf_storage_size,
+    )
+
+
+def get_tf_storage_size(tensor: "tf.Tensor") -> int:
+    # Return `math.ceil` since dtype byte size can be a float (e.g., 0.125 for tf.bool).
+    # Better to overestimate than underestimate.
+    return math.ceil(tensor.numpy().size * _dtype_byte_size_tf(tensor.dtype))
+
+
+def _dtype_byte_size_tf(dtype) -> float:
+    """
+    Returns the size (in bytes) occupied by one parameter of type `dtype`.
+    Taken from https://github.com/huggingface/transformers/blob/74d9d0cebb0263a3f8ab9c280569170cc74651d0/src/transformers/modeling_tf_utils.py#L608.
+    NOTE: why not `tensor.numpy().nbytes`?
+    Example:
+    ```py
+    >>> _dtype_byte_size(tf.float32)
+    4
+    ```
+    """
+    import tensorflow as tf
+
+    if dtype == tf.bool:
+        return 1 / 8
+    bit_search = re.search(r"[^\d](\d+)$", dtype.name)
+    if bit_search is None:
+        raise ValueError(f"`dtype` is not a valid dtype: {dtype}.")
+    bit_size = int(bit_search.groups()[0])
+    return bit_size // 8
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_torch.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_torch.py
new file mode 100644
index 0000000000000000000000000000000000000000..e24d46ab4e14415104922681cd64944a33a3d9ab
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/serialization/_torch.py
@@ -0,0 +1,1015 @@
+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains pytorch-specific helpers."""
+
+import importlib
+import json
+import os
+import re
+from collections import defaultdict, namedtuple
+from functools import lru_cache
+from pathlib import Path
+from typing import TYPE_CHECKING, Any, Dict, Iterable, List, NamedTuple, Optional, Set, Tuple, Union
+
+from packaging import version
+
+from .. import constants, logging
+from ._base import MAX_SHARD_SIZE, StateDictSplit, split_state_dict_into_shards_factory
+
+
+logger = logging.get_logger(__file__)
+
+if TYPE_CHECKING:
+    import torch
+
+# SAVING
+
+
+def save_torch_model(
+    model: "torch.nn.Module",
+    save_directory: Union[str, Path],
+    *,
+    filename_pattern: Optional[str] = None,
+    force_contiguous: bool = True,
+    max_shard_size: Union[int, str] = MAX_SHARD_SIZE,
+    metadata: Optional[Dict[str, str]] = None,
+    safe_serialization: bool = True,
+    is_main_process: bool = True,
+    shared_tensors_to_discard: Optional[List[str]] = None,
+):
+    """
+    Saves a given torch model to disk, handling sharding and shared tensors issues.
+
+    See also [`save_torch_state_dict`] to save a state dict with more flexibility.
+
+    For more information about tensor sharing, check out [this guide](https://huggingface.co/docs/safetensors/torch_shared_tensors).
+
+    The model state dictionary is split into shards so that each shard is smaller than a given size. The shards are
+    saved in the `save_directory` with the given `filename_pattern`. If the model is too big to fit in a single shard,
+    an index file is saved in the `save_directory` to indicate where each tensor is saved. This helper uses
+    [`split_torch_state_dict_into_shards`] under the hood. If `safe_serialization` is `True`, the shards are saved as
+    safetensors (the default). Otherwise, the shards are saved as pickle.
+
+    Before saving the model, the `save_directory` is cleaned from any previous shard files.
+
+    > [!WARNING]
+    > If one of the model's tensor is bigger than `max_shard_size`, it will end up in its own shard which will have a
+    > size greater than `max_shard_size`.
+
+    > [!WARNING]
+    > If your model is a `transformers.PreTrainedModel`, you should pass `model._tied_weights_keys` as `shared_tensors_to_discard` to properly handle shared tensors saving. This ensures the correct duplicate tensors are discarded during saving.
+
+    Args:
+        model (`torch.nn.Module`):
+            The model to save on disk.
+        save_directory (`str` or `Path`):
+            The directory in which the model will be saved.
+        filename_pattern (`str`, *optional*):
+            The pattern to generate the files names in which the model will be saved. Pattern must be a string that
+            can be formatted with `filename_pattern.format(suffix=...)` and must contain the keyword `suffix`
+            Defaults to `"model{suffix}.safetensors"` or `pytorch_model{suffix}.bin` depending on `safe_serialization`
+            parameter.
+        force_contiguous (`boolean`, *optional*):
+            Forcing the state_dict to be saved as contiguous tensors. This has no effect on the correctness of the
+            model, but it could potentially change performance if the layout of the tensor was chosen specifically for
+            that reason. Defaults to `True`.
+        max_shard_size (`int` or `str`, *optional*):
+            The maximum size of each shard, in bytes. Defaults to 5GB.
+        metadata (`Dict[str, str]`, *optional*):
+            Extra information to save along with the model. Some metadata will be added for each dropped tensors.
+            This information will not be enough to recover the entire shared structure but might help understanding
+            things.
+        safe_serialization (`bool`, *optional*):
+            Whether to save as safetensors, which is the default behavior. If `False`, the shards are saved as pickle.
+            Safe serialization is recommended for security reasons. Saving as pickle is deprecated and will be removed
+            in a future version.
+        is_main_process (`bool`, *optional*):
+            Whether the process calling this is the main process or not. Useful when in distributed training like
+            TPUs and need to call this function from all processes. In this case, set `is_main_process=True` only on
+            the main process to avoid race conditions. Defaults to True.
+        shared_tensors_to_discard (`List[str]`, *optional*):
+            List of tensor names to drop when saving shared tensors. If not provided and shared tensors are
+            detected, it will drop the first name alphabetically.
+
+    Example:
+
+    ```py
+    >>> from huggingface_hub import save_torch_model
+    >>> model = ... # A PyTorch model
+
+    # Save state dict to "path/to/folder". The model will be split into shards of 5GB each and saved as safetensors.
+    >>> save_torch_model(model, "path/to/folder")
+
+    # Load model back
+    >>> from huggingface_hub import load_torch_model  # TODO
+    >>> load_torch_model(model, "path/to/folder")
+    >>>
+    ```
+    """
+    save_torch_state_dict(
+        state_dict=model.state_dict(),
+        filename_pattern=filename_pattern,
+        force_contiguous=force_contiguous,
+        max_shard_size=max_shard_size,
+        metadata=metadata,
+        safe_serialization=safe_serialization,
+        save_directory=save_directory,
+        is_main_process=is_main_process,
+        shared_tensors_to_discard=shared_tensors_to_discard,
+    )
+
+
+def save_torch_state_dict(
+    state_dict: Dict[str, "torch.Tensor"],
+    save_directory: Union[str, Path],
+    *,
+    filename_pattern: Optional[str] = None,
+    force_contiguous: bool = True,
+    max_shard_size: Union[int, str] = MAX_SHARD_SIZE,
+    metadata: Optional[Dict[str, str]] = None,
+    safe_serialization: bool = True,
+    is_main_process: bool = True,
+    shared_tensors_to_discard: Optional[List[str]] = None,
+) -> None:
+    """
+    Save a model state dictionary to the disk, handling sharding and shared tensors issues.
+
+    See also [`save_torch_model`] to directly save a PyTorch model.
+
+    For more information about tensor sharing, check out [this guide](https://huggingface.co/docs/safetensors/torch_shared_tensors).
+
+    The model state dictionary is split into shards so that each shard is smaller than a given size. The shards are
+    saved in the `save_directory` with the given `filename_pattern`. If the model is too big to fit in a single shard,
+    an index file is saved in the `save_directory` to indicate where each tensor is saved. This helper uses
+    [`split_torch_state_dict_into_shards`] under the hood. If `safe_serialization` is `True`, the shards are saved as
+    safetensors (the default). Otherwise, the shards are saved as pickle.
+
+    Before saving the model, the `save_directory` is cleaned from any previous shard files.
+
+    > [!WARNING]
+    > If one of the model's tensor is bigger than `max_shard_size`, it will end up in its own shard which will have a
+    > size greater than `max_shard_size`.
+
+    > [!WARNING]
+    > If your model is a `transformers.PreTrainedModel`, you should pass `model._tied_weights_keys` as `shared_tensors_to_discard` to properly handle shared tensors saving. This ensures the correct duplicate tensors are discarded during saving.
+
+    Args:
+        state_dict (`Dict[str, torch.Tensor]`):
+            The state dictionary to save.
+        save_directory (`str` or `Path`):
+            The directory in which the model will be saved.
+        filename_pattern (`str`, *optional*):
+            The pattern to generate the files names in which the model will be saved. Pattern must be a string that
+            can be formatted with `filename_pattern.format(suffix=...)` and must contain the keyword `suffix`
+            Defaults to `"model{suffix}.safetensors"` or `pytorch_model{suffix}.bin` depending on `safe_serialization`
+            parameter.
+        force_contiguous (`boolean`, *optional*):
+            Forcing the state_dict to be saved as contiguous tensors. This has no effect on the correctness of the
+            model, but it could potentially change performance if the layout of the tensor was chosen specifically for
+            that reason. Defaults to `True`.
+        max_shard_size (`int` or `str`, *optional*):
+            The maximum size of each shard, in bytes. Defaults to 5GB.
+        metadata (`Dict[str, str]`, *optional*):
+            Extra information to save along with the model. Some metadata will be added for each dropped tensors.
+            This information will not be enough to recover the entire shared structure but might help understanding
+            things.
+        safe_serialization (`bool`, *optional*):
+            Whether to save as safetensors, which is the default behavior. If `False`, the shards are saved as pickle.
+            Safe serialization is recommended for security reasons. Saving as pickle is deprecated and will be removed
+            in a future version.
+        is_main_process (`bool`, *optional*):
+            Whether the process calling this is the main process or not. Useful when in distributed training like
+            TPUs and need to call this function from all processes. In this case, set `is_main_process=True` only on
+            the main process to avoid race conditions. Defaults to True.
+        shared_tensors_to_discard (`List[str]`, *optional*):
+            List of tensor names to drop when saving shared tensors. If not provided and shared tensors are
+            detected, it will drop the first name alphabetically.
+
+    Example:
+
+    ```py
+    >>> from huggingface_hub import save_torch_state_dict
+    >>> model = ... # A PyTorch model
+
+    # Save state dict to "path/to/folder". The model will be split into shards of 5GB each and saved as safetensors.
+    >>> state_dict = model_to_save.state_dict()
+    >>> save_torch_state_dict(state_dict, "path/to/folder")
+    ```
+    """
+    save_directory = str(save_directory)
+
+    if filename_pattern is None:
+        filename_pattern = (
+            constants.SAFETENSORS_WEIGHTS_FILE_PATTERN
+            if safe_serialization
+            else constants.PYTORCH_WEIGHTS_FILE_PATTERN
+        )
+
+    if metadata is None:
+        metadata = {}
+    if safe_serialization:
+        try:
+            from safetensors.torch import save_file as save_file_fn
+        except ImportError as e:
+            raise ImportError(
+                "Please install `safetensors` to use safe serialization. "
+                "You can install it with `pip install safetensors`."
+            ) from e
+        # Clean state dict for safetensors
+        state_dict = _clean_state_dict_for_safetensors(
+            state_dict,
+            metadata,
+            force_contiguous=force_contiguous,
+            shared_tensors_to_discard=shared_tensors_to_discard,
+        )
+    else:
+        from torch import save as save_file_fn  # type: ignore[assignment, no-redef]
+
+        logger.warning(
+            "You are using unsafe serialization. Due to security reasons, it is recommended not to load "
+            "pickled models from untrusted sources. If you intend to share your model, we strongly recommend "
+            "using safe serialization by installing `safetensors` with `pip install safetensors`."
+        )
+    # Split dict
+    state_dict_split = split_torch_state_dict_into_shards(
+        state_dict, filename_pattern=filename_pattern, max_shard_size=max_shard_size
+    )
+
+    # Only main process should clean up existing files to avoid race conditions in distributed environment
+    if is_main_process:
+        existing_files_regex = re.compile(filename_pattern.format(suffix=r"(-\d{5}-of-\d{5})?") + r"(\.index\.json)?")
+        for filename in os.listdir(save_directory):
+            if existing_files_regex.match(filename):
+                try:
+                    logger.debug(f"Removing existing file '{filename}' from folder.")
+                    os.remove(os.path.join(save_directory, filename))
+                except Exception as e:
+                    logger.warning(
+                        f"Error when trying to remove existing '{filename}' from folder: {e}. Continuing..."
+                    )
+
+    # Save each shard
+    per_file_metadata = {"format": "pt"}
+    if not state_dict_split.is_sharded:
+        per_file_metadata.update(metadata)
+    safe_file_kwargs = {"metadata": per_file_metadata} if safe_serialization else {}
+    for filename, tensors in state_dict_split.filename_to_tensors.items():
+        shard = {tensor: state_dict[tensor] for tensor in tensors}
+        save_file_fn(shard, os.path.join(save_directory, filename), **safe_file_kwargs)  # ty: ignore[invalid-argument-type]
+        logger.debug(f"Shard saved to {filename}")
+
+    # Save the index (if any)
+    if state_dict_split.is_sharded:
+        index_path = filename_pattern.format(suffix="") + ".index.json"
+        index = {
+            "metadata": {**state_dict_split.metadata, **metadata},
+            "weight_map": state_dict_split.tensor_to_filename,
+        }
+        with open(os.path.join(save_directory, index_path), "w") as f:
+            json.dump(index, f, indent=2)
+        logger.info(
+            f"The model is bigger than the maximum size per checkpoint ({max_shard_size}). "
+            f"Model weighs have been saved in {len(state_dict_split.filename_to_tensors)} checkpoint shards. "
+            f"You can find where each parameters has been saved in the index located at {index_path}."
+        )
+
+    logger.info(f"Model weights successfully saved to {save_directory}!")
+
+
+def split_torch_state_dict_into_shards(
+    state_dict: Dict[str, "torch.Tensor"],
+    *,
+    filename_pattern: str = constants.SAFETENSORS_WEIGHTS_FILE_PATTERN,
+    max_shard_size: Union[int, str] = MAX_SHARD_SIZE,
+) -> StateDictSplit:
+    """
+    Split a model state dictionary in shards so that each shard is smaller than a given size.
+
+    The shards are determined by iterating through the `state_dict` in the order of its keys. There is no optimization
+    made to make each shard as close as possible to the maximum size passed. For example, if the limit is 10GB and we
+    have tensors of sizes [6GB, 6GB, 2GB, 6GB, 2GB, 2GB] they will get sharded as [6GB], [6+2GB], [6+2+2GB] and not
+    [6+2+2GB], [6+2GB], [6GB].
+
+
+    > [!TIP]
+    > To save a model state dictionary to the disk, see [`save_torch_state_dict`]. This helper uses
+    > `split_torch_state_dict_into_shards` under the hood.
+
+    > [!WARNING]
+    > If one of the model's tensor is bigger than `max_shard_size`, it will end up in its own shard which will have a
+    > size greater than `max_shard_size`.
+
+    Args:
+        state_dict (`Dict[str, torch.Tensor]`):
+            The state dictionary to save.
+        filename_pattern (`str`, *optional*):
+            The pattern to generate the files names in which the model will be saved. Pattern must be a string that
+            can be formatted with `filename_pattern.format(suffix=...)` and must contain the keyword `suffix`
+            Defaults to `"model{suffix}.safetensors"`.
+        max_shard_size (`int` or `str`, *optional*):
+            The maximum size of each shard, in bytes. Defaults to 5GB.
+
+    Returns:
+        [`StateDictSplit`]: A `StateDictSplit` object containing the shards and the index to retrieve them.
+
+    Example:
+    ```py
+    >>> import json
+    >>> import os
+    >>> from safetensors.torch import save_file as safe_save_file
+    >>> from huggingface_hub import split_torch_state_dict_into_shards
+
+    >>> def save_state_dict(state_dict: Dict[str, torch.Tensor], save_directory: str):
+    ...     state_dict_split = split_torch_state_dict_into_shards(state_dict)
+    ...     for filename, tensors in state_dict_split.filename_to_tensors.items():
+    ...         shard = {tensor: state_dict[tensor] for tensor in tensors}
+    ...         safe_save_file(
+    ...             shard,
+    ...             os.path.join(save_directory, filename),
+    ...             metadata={"format": "pt"},
+    ...         )
+    ...     if state_dict_split.is_sharded:
+    ...         index = {
+    ...             "metadata": state_dict_split.metadata,
+    ...             "weight_map": state_dict_split.tensor_to_filename,
+    ...         }
+    ...         with open(os.path.join(save_directory, "model.safetensors.index.json"), "w") as f:
+    ...             f.write(json.dumps(index, indent=2))
+    ```
+    """
+    return split_state_dict_into_shards_factory(
+        state_dict,
+        max_shard_size=max_shard_size,
+        filename_pattern=filename_pattern,
+        get_storage_size=get_torch_storage_size,
+        get_storage_id=get_torch_storage_id,
+    )
+
+
+# LOADING
+
+
+def load_torch_model(
+    model: "torch.nn.Module",
+    checkpoint_path: Union[str, os.PathLike],
+    *,
+    strict: bool = False,
+    safe: bool = True,
+    weights_only: bool = False,
+    map_location: Optional[Union[str, "torch.device"]] = None,
+    mmap: bool = False,
+    filename_pattern: Optional[str] = None,
+) -> NamedTuple:
+    """
+    Load a checkpoint into a model, handling both sharded and non-sharded checkpoints.
+
+    Args:
+        model (`torch.nn.Module`):
+            The model in which to load the checkpoint.
+        checkpoint_path (`str` or `os.PathLike`):
+            Path to either the checkpoint file or directory containing the checkpoint(s).
+        strict (`bool`, *optional*, defaults to `False`):
+            Whether to strictly enforce that the keys in the model state dict match the keys in the checkpoint.
+        safe (`bool`, *optional*, defaults to `True`):
+            If `safe` is True, the safetensors files will be loaded. If `safe` is False, the function
+            will first attempt to load safetensors files if they are available, otherwise it will fall back to loading
+            pickle files. `filename_pattern` parameter takes precedence over `safe` parameter.
+        weights_only (`bool`, *optional*, defaults to `False`):
+            If True, only loads the model weights without optimizer states and other metadata.
+            Only supported in PyTorch >= 1.13.
+        map_location (`str` or `torch.device`, *optional*):
+            A `torch.device` object, string or a dict specifying how to remap storage locations. It
+            indicates the location where all tensors should be loaded.
+        mmap (`bool`, *optional*, defaults to `False`):
+            Whether to use memory-mapped file loading. Memory mapping can improve loading performance
+            for large models in PyTorch >= 2.1.0 with zipfile-based checkpoints.
+        filename_pattern (`str`, *optional*):
+            The pattern to look for the index file. Pattern must be a string that
+            can be formatted with `filename_pattern.format(suffix=...)` and must contain the keyword `suffix`
+            Defaults to `"model{suffix}.safetensors"`.
+    Returns:
+        `NamedTuple`: A named tuple with `missing_keys` and `unexpected_keys` fields.
+            - `missing_keys` is a list of str containing the missing keys, i.e. keys that are in the model but not in the checkpoint.
+            - `unexpected_keys` is a list of str containing the unexpected keys, i.e. keys that are in the checkpoint but not in the model.
+
+    Raises:
+        [`FileNotFoundError`](https://docs.python.org/3/library/exceptions.html#FileNotFoundError)
+            If the checkpoint file or directory does not exist.
+        [`ImportError`](https://docs.python.org/3/library/exceptions.html#ImportError)
+            If safetensors or torch is not installed when trying to load a .safetensors file or a PyTorch checkpoint respectively.
+        [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
+           If the checkpoint path is invalid or if the checkpoint format cannot be determined.
+
+    Example:
+    ```python
+    >>> from huggingface_hub import load_torch_model
+    >>> model = ... # A PyTorch model
+    >>> load_torch_model(model, "path/to/checkpoint")
+    ```
+    """
+    checkpoint_path = Path(checkpoint_path)
+
+    if not checkpoint_path.exists():
+        raise ValueError(f"Checkpoint path {checkpoint_path} does not exist")
+    # 1. Check if checkpoint is a single file
+    if checkpoint_path.is_file():
+        state_dict = load_state_dict_from_file(
+            checkpoint_file=checkpoint_path,
+            map_location=map_location,
+            weights_only=weights_only,
+        )
+        return model.load_state_dict(state_dict, strict=strict)
+
+    # 2. If not, checkpoint_path is a directory
+    if filename_pattern is None:
+        filename_pattern = constants.SAFETENSORS_WEIGHTS_FILE_PATTERN
+        index_path = checkpoint_path / (filename_pattern.format(suffix="") + ".index.json")
+        # Only fallback to pickle format if safetensors index is not found and safe is False.
+        if not index_path.is_file() and not safe:
+            filename_pattern = constants.PYTORCH_WEIGHTS_FILE_PATTERN
+
+    index_path = checkpoint_path / (filename_pattern.format(suffix="") + ".index.json")
+
+    if index_path.is_file():
+        return _load_sharded_checkpoint(
+            model=model,
+            save_directory=checkpoint_path,
+            strict=strict,
+            weights_only=weights_only,
+            filename_pattern=filename_pattern,
+        )
+
+    # Look for single model file
+    model_files = list(checkpoint_path.glob("*.safetensors" if safe else "*.bin"))
+    if len(model_files) == 1:
+        state_dict = load_state_dict_from_file(
+            checkpoint_file=model_files[0],
+            map_location=map_location,
+            weights_only=weights_only,
+            mmap=mmap,
+        )
+        return model.load_state_dict(state_dict, strict=strict)
+
+    raise ValueError(
+        f"Directory '{checkpoint_path}' does not contain a valid checkpoint. "
+        "Expected either a sharded checkpoint with an index file, or a single model file."
+    )
+
+
+def _load_sharded_checkpoint(
+    model: "torch.nn.Module",
+    save_directory: os.PathLike,
+    *,
+    strict: bool = False,
+    weights_only: bool = False,
+    filename_pattern: str = constants.SAFETENSORS_WEIGHTS_FILE_PATTERN,
+) -> NamedTuple:
+    """
+    Loads a sharded checkpoint into a model. This is the same as
+    [`torch.nn.Module.load_state_dict`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=load_state_dict#torch.nn.Module.load_state_dict)
+    but for a sharded checkpoint. Each shard is loaded one by one and removed from memory after being loaded into the model.
+
+    Args:
+        model (`torch.nn.Module`):
+            The model in which to load the checkpoint.
+        save_directory (`str` or `os.PathLike`):
+            A path to a folder containing the sharded checkpoint.
+        strict (`bool`, *optional*, defaults to `False`):
+            Whether to strictly enforce that the keys in the model state dict match the keys in the sharded checkpoint.
+        weights_only (`bool`, *optional*, defaults to `False`):
+            If True, only loads the model weights without optimizer states and other metadata.
+            Only supported in PyTorch >= 1.13.
+        filename_pattern (`str`, *optional*, defaults to `"model{suffix}.safetensors"`):
+            The pattern to look for the index file. Pattern must be a string that
+            can be formatted with `filename_pattern.format(suffix=...)` and must contain the keyword `suffix`
+            Defaults to `"model{suffix}.safetensors"`.
+
+    Returns:
+        `NamedTuple`: A named tuple with `missing_keys` and `unexpected_keys` fields,
+            - `missing_keys` is a list of str containing the missing keys
+            - `unexpected_keys` is a list of str containing the unexpected keys
+    """
+
+    # 1. Load and validate index file
+    # The index file contains mapping of parameter names to shard files
+    index_path = filename_pattern.format(suffix="") + ".index.json"
+    index_file = os.path.join(save_directory, index_path)
+    with open(index_file, "r", encoding="utf-8") as f:
+        index = json.load(f)
+
+    # 2. Validate keys if in strict mode
+    # This is done before loading any shards to fail fast
+    if strict:
+        _validate_keys_for_strict_loading(model, index["weight_map"].keys())
+
+    # 3. Load each shard using `load_state_dict`
+    # Get unique shard files (multiple parameters can be in same shard)
+    shard_files = list(set(index["weight_map"].values()))
+    for shard_file in shard_files:
+        # Load shard into memory
+        shard_path = os.path.join(save_directory, shard_file)
+        state_dict = load_state_dict_from_file(
+            shard_path,
+            map_location="cpu",
+            weights_only=weights_only,
+        )
+        # Update model with parameters from this shard
+        model.load_state_dict(state_dict, strict=strict)
+        # Explicitly remove the state dict from memory
+        del state_dict
+
+    # 4. Return compatibility info
+    loaded_keys = set(index["weight_map"].keys())
+    model_keys = set(model.state_dict().keys())
+    return _IncompatibleKeys(
+        missing_keys=list(model_keys - loaded_keys), unexpected_keys=list(loaded_keys - model_keys)
+    )
+
+
+def load_state_dict_from_file(
+    checkpoint_file: Union[str, os.PathLike],
+    map_location: Optional[Union[str, "torch.device"]] = None,
+    weights_only: bool = False,
+    mmap: bool = False,
+) -> Union[Dict[str, "torch.Tensor"], Any]:
+    """
+    Loads a checkpoint file, handling both safetensors and pickle checkpoint formats.
+
+    Args:
+        checkpoint_file (`str` or `os.PathLike`):
+            Path to the checkpoint file to load. Can be either a safetensors or pickle (`.bin`) checkpoint.
+        map_location (`str` or `torch.device`, *optional*):
+            A `torch.device` object, string or a dict specifying how to remap storage locations. It
+            indicates the location where all tensors should be loaded.
+        weights_only (`bool`, *optional*, defaults to `False`):
+            If True, only loads the model weights without optimizer states and other metadata.
+            Only supported for pickle (`.bin`) checkpoints with PyTorch >= 1.13. Has no effect when
+            loading safetensors files.
+        mmap (`bool`, *optional*, defaults to `False`):
+            Whether to use memory-mapped file loading. Memory mapping can improve loading performance
+            for large models in PyTorch >= 2.1.0 with zipfile-based checkpoints. Has no effect when
+            loading safetensors files, as the `safetensors` library uses memory mapping by default.
+
+    Returns:
+        `Union[Dict[str, "torch.Tensor"], Any]`: The loaded checkpoint.
+            - For safetensors files: always returns a dictionary mapping parameter names to tensors.
+            - For pickle files: returns any Python object that was pickled (commonly a state dict, but could be
+              an entire model, optimizer state, or any other Python object).
+
+    Raises:
+        [`FileNotFoundError`](https://docs.python.org/3/library/exceptions.html#FileNotFoundError)
+            If the checkpoint file does not exist.
+        [`ImportError`](https://docs.python.org/3/library/exceptions.html#ImportError)
+            If safetensors or torch is not installed when trying to load a .safetensors file or a PyTorch checkpoint respectively.
+        [`OSError`](https://docs.python.org/3/library/exceptions.html#OSError)
+            If the checkpoint file format is invalid or if git-lfs files are not properly downloaded.
+        [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
+            If the checkpoint file path is empty or invalid.
+
+    Example:
+    ```python
+    >>> from huggingface_hub import load_state_dict_from_file
+
+    # Load a PyTorch checkpoint
+    >>> state_dict = load_state_dict_from_file("path/to/model.bin", map_location="cpu")
+    >>> model.load_state_dict(state_dict)
+
+    # Load a safetensors checkpoint
+    >>> state_dict = load_state_dict_from_file("path/to/model.safetensors")
+    >>> model.load_state_dict(state_dict)
+    ```
+    """
+    checkpoint_path = Path(checkpoint_file)
+
+    # Check if file exists and is a regular file (not a directory)
+    if not checkpoint_path.is_file():
+        raise FileNotFoundError(
+            f"No checkpoint file found at '{checkpoint_path}'. Please verify the path is correct and "
+            "the file has been properly downloaded."
+        )
+
+    # Load safetensors checkpoint
+    if checkpoint_path.suffix == ".safetensors":
+        try:
+            from safetensors import safe_open
+            from safetensors.torch import load_file
+        except ImportError as e:
+            raise ImportError(
+                "Please install `safetensors` to load safetensors checkpoint. "
+                "You can install it with `pip install safetensors`."
+            ) from e
+
+        # Check format of the archive
+        with safe_open(checkpoint_file, framework="pt") as f:  # type: ignore[attr-defined]
+            metadata = f.metadata()
+        # see comment: https://github.com/huggingface/transformers/blob/3d213b57fe74302e5902d68ed9478c3ad1aaa713/src/transformers/modeling_utils.py#L3966
+        if metadata is not None and metadata.get("format") not in ["pt", "mlx"]:
+            raise OSError(
+                f"The safetensors archive passed at {checkpoint_file} does not contain the valid metadata. Make sure "
+                "you save your model with the `save_torch_model` method."
+            )
+        device = str(map_location.type) if map_location is not None and hasattr(map_location, "type") else map_location
+        # meta device is not supported with safetensors, falling back to CPU
+        if device == "meta":
+            logger.warning("Meta device is not supported with safetensors. Falling back to CPU device.")
+            device = "cpu"
+        return load_file(checkpoint_file, device=device)  # type: ignore[arg-type]
+    # Otherwise, load from pickle
+    try:
+        import torch
+        from torch import load
+    except ImportError as e:
+        raise ImportError(
+            "Please install `torch` to load torch tensors. You can install it with `pip install torch`."
+        ) from e
+    # Add additional kwargs, mmap is only supported in torch >= 2.1.0
+    additional_kwargs = {}
+    if version.parse(torch.__version__) >= version.parse("2.1.0"):
+        additional_kwargs["mmap"] = mmap
+
+    # weights_only is only supported in torch >= 1.13.0
+    if version.parse(torch.__version__) >= version.parse("1.13.0"):
+        additional_kwargs["weights_only"] = weights_only
+
+    return load(
+        checkpoint_file,
+        map_location=map_location,
+        **additional_kwargs,
+    )
+
+
+# HELPERS
+
+
+def _validate_keys_for_strict_loading(
+    model: "torch.nn.Module",
+    loaded_keys: Iterable[str],
+) -> None:
+    """
+    Validate that model keys match loaded keys when strict loading is enabled.
+
+    Args:
+        model: The PyTorch model being loaded
+        loaded_keys: The keys present in the checkpoint
+
+    Raises:
+        RuntimeError: If there are missing or unexpected keys in strict mode
+    """
+    loaded_keys_set = set(loaded_keys)
+    model_keys = set(model.state_dict().keys())
+    missing_keys = model_keys - loaded_keys_set  # Keys in model but not in checkpoint
+    unexpected_keys = loaded_keys_set - model_keys  # Keys in checkpoint but not in model
+
+    if missing_keys or unexpected_keys:
+        error_message = f"Error(s) in loading state_dict for {model.__class__.__name__}"
+        if missing_keys:
+            str_missing_keys = ",".join([f'"{k}"' for k in sorted(missing_keys)])
+            error_message += f"\nMissing key(s): {str_missing_keys}."
+        if unexpected_keys:
+            str_unexpected_keys = ",".join([f'"{k}"' for k in sorted(unexpected_keys)])
+            error_message += f"\nUnexpected key(s): {str_unexpected_keys}."
+        raise RuntimeError(error_message)
+
+
+def _get_unique_id(tensor: "torch.Tensor") -> Union[int, Tuple[Any, ...]]:
+    """Returns a unique id for plain tensor
+    or a (potentially nested) Tuple of unique id for the flattened Tensor
+    if the input is a wrapper tensor subclass Tensor
+    """
+
+    try:
+        from torch.distributed.tensor import DTensor
+
+        if isinstance(tensor, DTensor):
+            local_tensor = tensor.to_local()
+            return local_tensor.storage().data_ptr()
+    except ImportError:
+        pass
+
+    try:
+        # for torch 2.1 and above we can also handle tensor subclasses
+        from torch.utils._python_dispatch import is_traceable_wrapper_subclass
+
+        if is_traceable_wrapper_subclass(tensor):
+            attrs, _ = tensor.__tensor_flatten__()  # type: ignore[attr-defined]
+            return tuple(_get_unique_id(getattr(tensor, attr)) for attr in attrs)
+
+    except ImportError:
+        # for torch version less than 2.1, we can fallback to original implementation
+        pass
+
+    if tensor.device.type == "xla" and is_torch_tpu_available():
+        # NOTE: xla tensors dont have storage
+        # use some other unique id to distinguish.
+        # this is a XLA tensor, it must be created using torch_xla's
+        # device. So the following import is safe:
+        import torch_xla  # type: ignore[import]
+
+        unique_id = torch_xla._XLAC._xla_get_tensor_id(tensor)
+    else:
+        unique_id = storage_ptr(tensor)
+
+    return unique_id
+
+
+def get_torch_storage_id(tensor: "torch.Tensor") -> Optional[Tuple["torch.device", Union[int, Tuple[Any, ...]], int]]:
+    """
+    Return unique identifier to a tensor storage.
+
+    Multiple different tensors can share the same underlying storage. This identifier is
+    guaranteed to be unique and constant for this tensor's storage during its lifetime. Two tensor storages with
+    non-overlapping lifetimes may have the same id.
+    In the case of meta tensors, we return None since we can't tell if they share the same storage.
+
+    Taken from https://github.com/huggingface/transformers/blob/1ecf5f7c982d761b4daaa96719d162c324187c64/src/transformers/pytorch_utils.py#L278.
+    """
+    if tensor.device.type == "meta":
+        return None
+    else:
+        return tensor.device, _get_unique_id(tensor), get_torch_storage_size(tensor)
+
+
+def get_torch_storage_size(tensor: "torch.Tensor") -> int:
+    """
+    Taken from https://github.com/huggingface/safetensors/blob/08db34094e9e59e2f9218f2df133b7b4aaff5a99/bindings/python/py_src/safetensors/torch.py#L31C1-L41C59
+    """
+    try:
+        from torch.distributed.tensor import DTensor
+
+        if isinstance(tensor, DTensor):
+            # this returns the size of the FULL tensor in bytes
+            return tensor.nbytes
+    except ImportError:
+        pass
+
+    try:
+        # for torch 2.1 and above we can also handle tensor subclasses
+        from torch.utils._python_dispatch import is_traceable_wrapper_subclass
+
+        if is_traceable_wrapper_subclass(tensor):
+            attrs, _ = tensor.__tensor_flatten__()  # type: ignore[attr-defined]
+            return sum(get_torch_storage_size(getattr(tensor, attr)) for attr in attrs)
+    except ImportError:
+        # for torch version less than 2.1, we can fallback to original implementation
+        pass
+
+    try:
+        return tensor.untyped_storage().nbytes()
+    except AttributeError:
+        # Fallback for torch==1.10
+        try:
+            return tensor.storage().size() * _get_dtype_size(tensor.dtype)
+        except NotImplementedError:
+            # Fallback for meta storage
+            # On torch >=2.0 this is the tensor size
+            return tensor.nelement() * _get_dtype_size(tensor.dtype)
+
+
+@lru_cache()
+def is_torch_tpu_available(check_device=True):
+    """
+    Checks if `torch_xla` is installed and potentially if a TPU is in the environment
+
+    Taken from https://github.com/huggingface/transformers/blob/1ecf5f7c982d761b4daaa96719d162c324187c64/src/transformers/utils/import_utils.py#L463.
+    """
+    if importlib.util.find_spec("torch_xla") is not None:
+        if check_device:
+            # We need to check if `xla_device` can be found, will raise a RuntimeError if not
+            try:
+                import torch_xla.core.xla_model as xm  # type: ignore[import]
+
+                _ = xm.xla_device()
+                return True
+            except RuntimeError:
+                return False
+        return True
+    return False
+
+
+def storage_ptr(tensor: "torch.Tensor") -> Union[int, Tuple[Any, ...]]:
+    """
+    Taken from https://github.com/huggingface/safetensors/blob/079781fd0dc455ba0fe851e2b4507c33d0c0d407/bindings/python/py_src/safetensors/torch.py#L11.
+    """
+    try:
+        # for torch 2.1 and above we can also handle tensor subclasses
+        from torch.utils._python_dispatch import is_traceable_wrapper_subclass
+
+        if is_traceable_wrapper_subclass(tensor):
+            return _get_unique_id(tensor)  # type: ignore
+    except ImportError:
+        # for torch version less than 2.1, we can fallback to original implementation
+        pass
+
+    try:
+        return tensor.untyped_storage().data_ptr()
+    except Exception:
+        # Fallback for torch==1.10
+        try:
+            return tensor.storage().data_ptr()
+        except NotImplementedError:
+            # Fallback for meta storage
+            return 0
+
+
+def _clean_state_dict_for_safetensors(
+    state_dict: Dict[str, "torch.Tensor"],
+    metadata: Dict[str, str],
+    force_contiguous: bool = True,
+    shared_tensors_to_discard: Optional[List[str]] = None,
+):
+    """Remove shared tensors from state_dict and update metadata accordingly (for reloading).
+
+    Warning: `state_dict` and `metadata` are mutated in-place!
+
+    Taken from https://github.com/huggingface/safetensors/blob/079781fd0dc455ba0fe851e2b4507c33d0c0d407/bindings/python/py_src/safetensors/torch.py#L155.
+    """
+    to_removes = _remove_duplicate_names(state_dict, discard_names=shared_tensors_to_discard)
+    for kept_name, to_remove_group in to_removes.items():
+        for to_remove in to_remove_group:
+            if metadata is None:
+                metadata = {}
+
+            if to_remove not in metadata:
+                # Do not override user data
+                metadata[to_remove] = kept_name
+            del state_dict[to_remove]
+    if force_contiguous:
+        state_dict = {k: v.contiguous() for k, v in state_dict.items()}
+    return state_dict
+
+
+def _end_ptr(tensor: "torch.Tensor") -> int:
+    """
+    Taken from https://github.com/huggingface/safetensors/blob/079781fd0dc455ba0fe851e2b4507c33d0c0d407/bindings/python/py_src/safetensors/torch.py#L23.
+    """
+    if tensor.nelement():
+        stop = tensor.view(-1)[-1].data_ptr() + _get_dtype_size(tensor.dtype)
+    else:
+        stop = tensor.data_ptr()
+    return stop
+
+
+def _filter_shared_not_shared(tensors: List[Set[str]], state_dict: Dict[str, "torch.Tensor"]) -> List[Set[str]]:
+    """
+    Taken from https://github.com/huggingface/safetensors/blob/079781fd0dc455ba0fe851e2b4507c33d0c0d407/bindings/python/py_src/safetensors/torch.py#L44
+    """
+    filtered_tensors = []
+    for shared in tensors:
+        if len(shared) < 2:
+            filtered_tensors.append(shared)
+            continue
+
+        areas = []
+        for name in shared:
+            tensor = state_dict[name]
+            areas.append((tensor.data_ptr(), _end_ptr(tensor), name))
+        areas.sort()
+
+        _, last_stop, last_name = areas[0]
+        filtered_tensors.append({last_name})
+        for start, stop, name in areas[1:]:
+            if start >= last_stop:
+                filtered_tensors.append({name})
+            else:
+                filtered_tensors[-1].add(name)
+            last_stop = stop
+
+    return filtered_tensors
+
+
+def _find_shared_tensors(state_dict: Dict[str, "torch.Tensor"]) -> List[Set[str]]:
+    """
+    Taken from https://github.com/huggingface/safetensors/blob/079781fd0dc455ba0fe851e2b4507c33d0c0d407/bindings/python/py_src/safetensors/torch.py#L69.
+    """
+    import torch
+
+    tensors_dict = defaultdict(set)
+    for k, v in state_dict.items():
+        if v.device != torch.device("meta") and storage_ptr(v) != 0 and get_torch_storage_size(v) != 0:
+            # Need to add device as key because of multiple GPU.
+            tensors_dict[(v.device, storage_ptr(v), get_torch_storage_size(v))].add(k)
+    tensors = list(sorted(tensors_dict.values()))
+    tensors = _filter_shared_not_shared(tensors, state_dict)
+    return tensors
+
+
+def _is_complete(tensor: "torch.Tensor") -> bool:
+    """
+    Taken from https://github.com/huggingface/safetensors/blob/079781fd0dc455ba0fe851e2b4507c33d0c0d407/bindings/python/py_src/safetensors/torch.py#L80
+    """
+    try:
+        # for torch 2.1 and above we can also handle tensor subclasses
+        from torch.utils._python_dispatch import is_traceable_wrapper_subclass
+
+        if is_traceable_wrapper_subclass(tensor):
+            attrs, _ = tensor.__tensor_flatten__()  # type: ignore[attr-defined]
+            return all(_is_complete(getattr(tensor, attr)) for attr in attrs)
+    except ImportError:
+        # for torch version less than 2.1, we can fallback to original implementation
+        pass
+
+    return tensor.data_ptr() == storage_ptr(tensor) and tensor.nelement() * _get_dtype_size(
+        tensor.dtype
+    ) == get_torch_storage_size(tensor)
+
+
+def _remove_duplicate_names(
+    state_dict: Dict[str, "torch.Tensor"],
+    *,
+    preferred_names: Optional[List[str]] = None,
+    discard_names: Optional[List[str]] = None,
+) -> Dict[str, List[str]]:
+    """
+    Taken from https://github.com/huggingface/safetensors/blob/079781fd0dc455ba0fe851e2b4507c33d0c0d407/bindings/python/py_src/safetensors/torch.py#L80
+    """
+    if preferred_names is None:
+        preferred_names = []
+    unique_preferred_names = set(preferred_names)
+    if discard_names is None:
+        discard_names = []
+    unique_discard_names = set(discard_names)
+
+    shareds = _find_shared_tensors(state_dict)
+    to_remove = defaultdict(list)
+    for shared in shareds:
+        complete_names = set([name for name in shared if _is_complete(state_dict[name])])
+        if not complete_names:
+            raise RuntimeError(
+                "Error while trying to find names to remove to save state dict, but found no suitable name to keep"
+                f" for saving amongst: {shared}. None is covering the entire storage. Refusing to save/load the model"
+                " since you could be storing much more memory than needed. Please refer to"
+                " https://huggingface.co/docs/safetensors/torch_shared_tensors for more information. Or open an"
+                " issue."
+            )
+
+        keep_name = sorted(list(complete_names))[0]
+
+        # Mechanism to preferentially select keys to keep
+        # coming from the on-disk file to allow
+        # loading models saved with a different choice
+        # of keep_name
+        preferred = complete_names.difference(unique_discard_names)
+        if preferred:
+            keep_name = sorted(list(preferred))[0]
+
+        if unique_preferred_names:
+            preferred = unique_preferred_names.intersection(complete_names)
+            if preferred:
+                keep_name = sorted(list(preferred))[0]
+        for name in sorted(shared):
+            if name != keep_name:
+                to_remove[keep_name].append(name)
+    return to_remove
+
+
+@lru_cache()
+def _get_dtype_size(dtype: "torch.dtype") -> int:
+    """
+    Taken from https://github.com/huggingface/safetensors/blob/08db34094e9e59e2f9218f2df133b7b4aaff5a99/bindings/python/py_src/safetensors/torch.py#L344
+    """
+    import torch
+
+    # torch.float8 formats require 2.1; we do not support these dtypes on earlier versions
+    _float8_e4m3fn = getattr(torch, "float8_e4m3fn", None)
+    _float8_e5m2 = getattr(torch, "float8_e5m2", None)
+    _SIZE = {
+        torch.int64: 8,
+        torch.float32: 4,
+        torch.int32: 4,
+        torch.bfloat16: 2,
+        torch.float16: 2,
+        torch.int16: 2,
+        torch.uint8: 1,
+        torch.int8: 1,
+        torch.bool: 1,
+        torch.float64: 8,
+        _float8_e4m3fn: 1,
+        _float8_e5m2: 1,
+    }
+    return _SIZE[dtype]
+
+
+class _IncompatibleKeys(namedtuple("IncompatibleKeys", ["missing_keys", "unexpected_keys"])):
+    """
+    This is used to report missing and unexpected keys in the state dict.
+    Taken from https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/module.py#L52.
+
+    """
+
+    def __repr__(self) -> str:
+        if not self.missing_keys and not self.unexpected_keys:
+            return "<All keys matched successfully>"
+        return super().__repr__()
+
+    __str__ = __repr__
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/templates/datasetcard_template.md b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/templates/datasetcard_template.md
new file mode 100644
index 0000000000000000000000000000000000000000..9af29ebbed93653ec74a8952e314e7554323ef15
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/templates/datasetcard_template.md
@@ -0,0 +1,143 @@
+---
+# For reference on dataset card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/datasets-cards
+{{ card_data }}
+---
+
+# Dataset Card for {{ pretty_name | default("Dataset Name", true) }}
+
+<!-- Provide a quick summary of the dataset. -->
+
+{{ dataset_summary | default("", true) }}
+
+## Dataset Details
+
+### Dataset Description
+
+<!-- Provide a longer summary of what this dataset is. -->
+
+{{ dataset_description | default("", true) }}
+
+- **Curated by:** {{ curators | default("[More Information Needed]", true)}}
+- **Funded by [optional]:** {{ funded_by | default("[More Information Needed]", true)}}
+- **Shared by [optional]:** {{ shared_by | default("[More Information Needed]", true)}}
+- **Language(s) (NLP):** {{ language | default("[More Information Needed]", true)}}
+- **License:** {{ license | default("[More Information Needed]", true)}}
+
+### Dataset Sources [optional]
+
+<!-- Provide the basic links for the dataset. -->
+
+- **Repository:** {{ repo | default("[More Information Needed]", true)}}
+- **Paper [optional]:** {{ paper | default("[More Information Needed]", true)}}
+- **Demo [optional]:** {{ demo | default("[More Information Needed]", true)}}
+
+## Uses
+
+<!-- Address questions around how the dataset is intended to be used. -->
+
+### Direct Use
+
+<!-- This section describes suitable use cases for the dataset. -->
+
+{{ direct_use | default("[More Information Needed]", true)}}
+
+### Out-of-Scope Use
+
+<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
+
+{{ out_of_scope_use | default("[More Information Needed]", true)}}
+
+## Dataset Structure
+
+<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
+
+{{ dataset_structure | default("[More Information Needed]", true)}}
+
+## Dataset Creation
+
+### Curation Rationale
+
+<!-- Motivation for the creation of this dataset. -->
+
+{{ curation_rationale_section | default("[More Information Needed]", true)}}
+
+### Source Data
+
+<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
+
+#### Data Collection and Processing
+
+<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. -->
+
+{{ data_collection_and_processing_section | default("[More Information Needed]", true)}}
+
+#### Who are the source data producers?
+
+<!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. -->
+
+{{ source_data_producers_section | default("[More Information Needed]", true)}}
+
+### Annotations [optional]
+
+<!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. -->
+
+#### Annotation process
+
+<!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. -->
+
+{{ annotation_process_section | default("[More Information Needed]", true)}}
+
+#### Who are the annotators?
+
+<!-- This section describes the people or systems who created the annotations. -->
+
+{{ who_are_annotators_section | default("[More Information Needed]", true)}}
+
+#### Personal and Sensitive Information
+
+<!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. -->
+
+{{ personal_and_sensitive_information | default("[More Information Needed]", true)}}
+
+## Bias, Risks, and Limitations
+
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+
+{{ bias_risks_limitations | default("[More Information Needed]", true)}}
+
+### Recommendations
+
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+
+{{ bias_recommendations | default("Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.", true)}}
+
+## Citation [optional]
+
+<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
+
+**BibTeX:**
+
+{{ citation_bibtex | default("[More Information Needed]", true)}}
+
+**APA:**
+
+{{ citation_apa | default("[More Information Needed]", true)}}
+
+## Glossary [optional]
+
+<!-- If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. -->
+
+{{ glossary | default("[More Information Needed]", true)}}
+
+## More Information [optional]
+
+{{ more_information | default("[More Information Needed]", true)}}
+
+## Dataset Card Authors [optional]
+
+{{ dataset_card_authors | default("[More Information Needed]", true)}}
+
+## Dataset Card Contact
+
+{{ dataset_card_contact | default("[More Information Needed]", true)}}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/templates/modelcard_template.md b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/templates/modelcard_template.md
new file mode 100644
index 0000000000000000000000000000000000000000..79ca15e4547debac763b390ef8e4b715e6f6403f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/templates/modelcard_template.md
@@ -0,0 +1,200 @@
+---
+# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/model-cards
+{{ card_data }}
+---
+
+# Model Card for {{ model_id | default("Model ID", true) }}
+
+<!-- Provide a quick summary of what the model is/does. -->
+
+{{ model_summary | default("", true) }}
+
+## Model Details
+
+### Model Description
+
+<!-- Provide a longer summary of what this model is. -->
+
+{{ model_description | default("", true) }}
+
+- **Developed by:** {{ developers | default("[More Information Needed]", true)}}
+- **Funded by [optional]:** {{ funded_by | default("[More Information Needed]", true)}}
+- **Shared by [optional]:** {{ shared_by | default("[More Information Needed]", true)}}
+- **Model type:** {{ model_type | default("[More Information Needed]", true)}}
+- **Language(s) (NLP):** {{ language | default("[More Information Needed]", true)}}
+- **License:** {{ license | default("[More Information Needed]", true)}}
+- **Finetuned from model [optional]:** {{ base_model | default("[More Information Needed]", true)}}
+
+### Model Sources [optional]
+
+<!-- Provide the basic links for the model. -->
+
+- **Repository:** {{ repo | default("[More Information Needed]", true)}}
+- **Paper [optional]:** {{ paper | default("[More Information Needed]", true)}}
+- **Demo [optional]:** {{ demo | default("[More Information Needed]", true)}}
+
+## Uses
+
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+
+### Direct Use
+
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+
+{{ direct_use | default("[More Information Needed]", true)}}
+
+### Downstream Use [optional]
+
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+
+{{ downstream_use | default("[More Information Needed]", true)}}
+
+### Out-of-Scope Use
+
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+
+{{ out_of_scope_use | default("[More Information Needed]", true)}}
+
+## Bias, Risks, and Limitations
+
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+
+{{ bias_risks_limitations | default("[More Information Needed]", true)}}
+
+### Recommendations
+
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+
+{{ bias_recommendations | default("Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.", true)}}
+
+## How to Get Started with the Model
+
+Use the code below to get started with the model.
+
+{{ get_started_code | default("[More Information Needed]", true)}}
+
+## Training Details
+
+### Training Data
+
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+
+{{ training_data | default("[More Information Needed]", true)}}
+
+### Training Procedure
+
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+
+#### Preprocessing [optional]
+
+{{ preprocessing | default("[More Information Needed]", true)}}
+
+
+#### Training Hyperparameters
+
+- **Training regime:** {{ training_regime | default("[More Information Needed]", true)}} <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+
+#### Speeds, Sizes, Times [optional]
+
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+
+{{ speeds_sizes_times | default("[More Information Needed]", true)}}
+
+## Evaluation
+
+<!-- This section describes the evaluation protocols and provides the results. -->
+
+### Testing Data, Factors & Metrics
+
+#### Testing Data
+
+<!-- This should link to a Dataset Card if possible. -->
+
+{{ testing_data | default("[More Information Needed]", true)}}
+
+#### Factors
+
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+
+{{ testing_factors | default("[More Information Needed]", true)}}
+
+#### Metrics
+
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+
+{{ testing_metrics | default("[More Information Needed]", true)}}
+
+### Results
+
+{{ results | default("[More Information Needed]", true)}}
+
+#### Summary
+
+{{ results_summary | default("", true) }}
+
+## Model Examination [optional]
+
+<!-- Relevant interpretability work for the model goes here -->
+
+{{ model_examination | default("[More Information Needed]", true)}}
+
+## Environmental Impact
+
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+
+- **Hardware Type:** {{ hardware_type | default("[More Information Needed]", true)}}
+- **Hours used:** {{ hours_used | default("[More Information Needed]", true)}}
+- **Cloud Provider:** {{ cloud_provider | default("[More Information Needed]", true)}}
+- **Compute Region:** {{ cloud_region | default("[More Information Needed]", true)}}
+- **Carbon Emitted:** {{ co2_emitted | default("[More Information Needed]", true)}}
+
+## Technical Specifications [optional]
+
+### Model Architecture and Objective
+
+{{ model_specs | default("[More Information Needed]", true)}}
+
+### Compute Infrastructure
+
+{{ compute_infrastructure | default("[More Information Needed]", true)}}
+
+#### Hardware
+
+{{ hardware_requirements | default("[More Information Needed]", true)}}
+
+#### Software
+
+{{ software | default("[More Information Needed]", true)}}
+
+## Citation [optional]
+
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+
+**BibTeX:**
+
+{{ citation_bibtex | default("[More Information Needed]", true)}}
+
+**APA:**
+
+{{ citation_apa | default("[More Information Needed]", true)}}
+
+## Glossary [optional]
+
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+
+{{ glossary | default("[More Information Needed]", true)}}
+
+## More Information [optional]
+
+{{ more_information | default("[More Information Needed]", true)}}
+
+## Model Card Authors [optional]
+
+{{ model_card_authors | default("[More Information Needed]", true)}}
+
+## Model Card Contact
+
+{{ model_card_contact | default("[More Information Needed]", true)}}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..992eac104bd80de97444003172e926d5ad4522a0
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__init__.py
@@ -0,0 +1,117 @@
+# coding=utf-8
+# Copyright 2021 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License
+
+# ruff: noqa: F401
+from huggingface_hub.errors import (
+    BadRequestError,
+    CacheNotFound,
+    CorruptedCacheException,
+    DisabledRepoError,
+    EntryNotFoundError,
+    FileMetadataError,
+    GatedRepoError,
+    HfHubHTTPError,
+    HFValidationError,
+    LocalEntryNotFoundError,
+    LocalTokenNotFoundError,
+    NotASafetensorsRepoError,
+    OfflineModeIsEnabled,
+    RepositoryNotFoundError,
+    RevisionNotFoundError,
+    SafetensorsParsingError,
+)
+
+from . import tqdm as _tqdm  # _tqdm is the module
+from ._auth import get_stored_tokens, get_token
+from ._cache_assets import cached_assets_path
+from ._cache_manager import (
+    CachedFileInfo,
+    CachedRepoInfo,
+    CachedRevisionInfo,
+    DeleteCacheStrategy,
+    HFCacheInfo,
+    scan_cache_dir,
+)
+from ._chunk_utils import chunk_iterable
+from ._datetime import parse_datetime
+from ._experimental import experimental
+from ._fixes import SoftTemporaryDirectory, WeakFileLock, yaml_dump
+from ._git_credential import list_credential_helpers, set_git_credential, unset_git_credential
+from ._headers import build_hf_headers, get_token_to_send
+from ._hf_folder import HfFolder
+from ._http import (
+    configure_http_backend,
+    fix_hf_endpoint_in_url,
+    get_session,
+    hf_raise_for_status,
+    http_backoff,
+    reset_sessions,
+)
+from ._pagination import paginate
+from ._paths import DEFAULT_IGNORE_PATTERNS, FORBIDDEN_FOLDERS, filter_repo_objects
+from ._runtime import (
+    dump_environment_info,
+    get_aiohttp_version,
+    get_fastai_version,
+    get_fastapi_version,
+    get_fastcore_version,
+    get_gradio_version,
+    get_graphviz_version,
+    get_hf_hub_version,
+    get_hf_transfer_version,
+    get_jinja_version,
+    get_numpy_version,
+    get_pillow_version,
+    get_pydantic_version,
+    get_pydot_version,
+    get_python_version,
+    get_tensorboard_version,
+    get_tf_version,
+    get_torch_version,
+    is_aiohttp_available,
+    is_colab_enterprise,
+    is_fastai_available,
+    is_fastapi_available,
+    is_fastcore_available,
+    is_google_colab,
+    is_gradio_available,
+    is_graphviz_available,
+    is_hf_transfer_available,
+    is_jinja_available,
+    is_notebook,
+    is_numpy_available,
+    is_package_available,
+    is_pillow_available,
+    is_pydantic_available,
+    is_pydot_available,
+    is_safetensors_available,
+    is_tensorboard_available,
+    is_tf_available,
+    is_torch_available,
+)
+from ._safetensors import SafetensorsFileMetadata, SafetensorsRepoMetadata, TensorInfo
+from ._subprocess import capture_output, run_interactive_subprocess, run_subprocess
+from ._telemetry import send_telemetry
+from ._typing import is_jsonable, is_simple_optional_type, unwrap_simple_optional_type
+from ._validators import smoothly_deprecate_use_auth_token, validate_hf_hub_args, validate_repo_id
+from ._xet import (
+    XetConnectionInfo,
+    XetFileData,
+    XetTokenType,
+    fetch_xet_connection_info_from_repo_info,
+    parse_xet_file_data_from_response,
+    refresh_xet_connection_info,
+)
+from .tqdm import are_progress_bars_disabled, disable_progress_bars, enable_progress_bars, tqdm, tqdm_stream_file
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b68d29ad177a986ac1ae0776b7077d276aa57389
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_auth.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_auth.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..86b4efa0ad9d01c40293784c4c19cd18f52f3074
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_auth.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_cache_assets.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_cache_assets.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..2d3d5bf77fd155e968cf0543afdcb5e1ad27d494
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_cache_assets.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_cache_manager.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_cache_manager.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..956d8305585189d6221e800bc8e14ba7baff7643
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_cache_manager.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_chunk_utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_chunk_utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..48916f78aab3d2efb5c2d372236e5f3d75e98044
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_chunk_utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_datetime.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_datetime.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3fa3dfe386e8f5b8f015019182c7f56b18c02e9e
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_datetime.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_deprecation.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_deprecation.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..490501022c4b39f2a51cdb151eff356f595777d6
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_deprecation.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_dotenv.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_dotenv.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3afc413ca5337e5b597dbf6c335a86f4f67b631d
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_dotenv.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_experimental.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_experimental.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..bd9000eb15e901a68967a165c9844ad2e5fa30c0
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_experimental.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_fixes.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_fixes.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..38b5cb55d88c8be70ca0444517504e1946ae7782
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_fixes.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_git_credential.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_git_credential.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e9f4685299dbcd37fc41e503185ef9190dd343aa
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_git_credential.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_headers.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_headers.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..389ff625387b9570de5f8e3a0385e2623d026a70
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_headers.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_hf_folder.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_hf_folder.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..141df5f5ae351d50996a1dda60d718860b6e3c00
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_hf_folder.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_http.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_http.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..9fabd5bc9cf3ce949d978dd05cc72eb6cd534d9c
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_http.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_lfs.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_lfs.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..553069e063d88facf56fef5e52e5065960c52576
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_lfs.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_pagination.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_pagination.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c31cdd2c03b3398c6629d9229d00618e9225636c
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_pagination.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_paths.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_paths.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b59d62dcb7032c9137741e3fdb5fb3b2e527ab76
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_paths.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_runtime.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_runtime.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..75216eaeb0941daf82f6784617c9710ab8f9fe65
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_runtime.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_safetensors.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_safetensors.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a3d4346b311758e094caa9a1f6d830ce9054cbf2
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_safetensors.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_subprocess.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_subprocess.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..245d55bda0bc3cb2d5804c1e86c21d28e493c2db
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_subprocess.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_telemetry.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_telemetry.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..0464acc9a77afe06d0c744bd99cba70cdefeee6c
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_telemetry.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_typing.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_typing.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..4c2f1715d170c6496e31667ed59ab524ca7245be
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_typing.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_validators.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_validators.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..fb9b2f0612db34964d8cec7e3782a908bd09bf97
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_validators.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_xet.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_xet.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..8fefaa726f6c834e09995abce11cea397989bb9b
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_xet.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_xet_progress_reporting.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_xet_progress_reporting.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ba3cf2f450f2285ca763a662297931c0f6eeee38
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/_xet_progress_reporting.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/endpoint_helpers.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/endpoint_helpers.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6ba7e15f362f2d6c1387b7775cc701291d921937
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/endpoint_helpers.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/insecure_hashlib.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/insecure_hashlib.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c04a0bb0a1f5ad8c67f04abdbf7b0a62528301f0
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/insecure_hashlib.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/logging.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/logging.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..904d40fd89a96c2d25a3062cd72b19decd201357
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/logging.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/sha.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/sha.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c8decc5e569df33c728550a33a6fa90346649661
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/sha.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/tqdm.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/tqdm.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b72c3f1f965a1893048ec0122217a0c3636c2d1b
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/__pycache__/tqdm.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_auth.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_auth.py
new file mode 100644
index 0000000000000000000000000000000000000000..72be4dedbd94421ee2b4b2ba1073569d71b50569
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_auth.py
@@ -0,0 +1,214 @@
+# Copyright 2023 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains an helper to get the token from machine (env variable, secret or config file)."""
+
+import configparser
+import logging
+import os
+import warnings
+from pathlib import Path
+from threading import Lock
+from typing import Dict, Optional
+
+from .. import constants
+from ._runtime import is_colab_enterprise, is_google_colab
+
+
+_IS_GOOGLE_COLAB_CHECKED = False
+_GOOGLE_COLAB_SECRET_LOCK = Lock()
+_GOOGLE_COLAB_SECRET: Optional[str] = None
+
+logger = logging.getLogger(__name__)
+
+
+def get_token() -> Optional[str]:
+    """
+    Get token if user is logged in.
+
+    Note: in most cases, you should use [`huggingface_hub.utils.build_hf_headers`] instead. This method is only useful
+          if you want to retrieve the token for other purposes than sending an HTTP request.
+
+    Token is retrieved in priority from the `HF_TOKEN` environment variable. Otherwise, we read the token file located
+    in the Hugging Face home folder. Returns None if user is not logged in. To log in, use [`login`] or
+    `hf auth login`.
+
+    Returns:
+        `str` or `None`: The token, `None` if it doesn't exist.
+    """
+    return _get_token_from_google_colab() or _get_token_from_environment() or _get_token_from_file()
+
+
+def _get_token_from_google_colab() -> Optional[str]:
+    """Get token from Google Colab secrets vault using `google.colab.userdata.get(...)`.
+
+    Token is read from the vault only once per session and then stored in a global variable to avoid re-requesting
+    access to the vault.
+    """
+    # If it's not a Google Colab or it's Colab Enterprise, fallback to environment variable or token file authentication
+    if not is_google_colab() or is_colab_enterprise():
+        return None
+
+    # `google.colab.userdata` is not thread-safe
+    # This can lead to a deadlock if multiple threads try to access it at the same time
+    # (typically when using `snapshot_download`)
+    # => use a lock
+    # See https://github.com/huggingface/huggingface_hub/issues/1952 for more details.
+    with _GOOGLE_COLAB_SECRET_LOCK:
+        global _GOOGLE_COLAB_SECRET
+        global _IS_GOOGLE_COLAB_CHECKED
+
+        if _IS_GOOGLE_COLAB_CHECKED:  # request access only once
+            return _GOOGLE_COLAB_SECRET
+
+        try:
+            from google.colab import userdata  # type: ignore
+            from google.colab.errors import Error as ColabError  # type: ignore
+        except ImportError:
+            return None
+
+        try:
+            token = userdata.get("HF_TOKEN")
+            _GOOGLE_COLAB_SECRET = _clean_token(token)
+        except userdata.NotebookAccessError:
+            # Means the user has a secret call `HF_TOKEN` and got a popup "please grand access to HF_TOKEN" and refused it
+            # => warn user but ignore error => do not re-request access to user
+            warnings.warn(
+                "\nAccess to the secret `HF_TOKEN` has not been granted on this notebook."
+                "\nYou will not be requested again."
+                "\nPlease restart the session if you want to be prompted again."
+            )
+            _GOOGLE_COLAB_SECRET = None
+        except userdata.SecretNotFoundError:
+            # Means the user did not define a `HF_TOKEN` secret => warn
+            warnings.warn(
+                "\nThe secret `HF_TOKEN` does not exist in your Colab secrets."
+                "\nTo authenticate with the Hugging Face Hub, create a token in your settings tab "
+                "(https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session."
+                "\nYou will be able to reuse this secret in all of your notebooks."
+                "\nPlease note that authentication is recommended but still optional to access public models or datasets."
+            )
+            _GOOGLE_COLAB_SECRET = None
+        except ColabError as e:
+            # Something happen but we don't know what => recommend to open a GitHub issue
+            warnings.warn(
+                f"\nError while fetching `HF_TOKEN` secret value from your vault: '{str(e)}'."
+                "\nYou are not authenticated with the Hugging Face Hub in this notebook."
+                "\nIf the error persists, please let us know by opening an issue on GitHub "
+                "(https://github.com/huggingface/huggingface_hub/issues/new)."
+            )
+            _GOOGLE_COLAB_SECRET = None
+
+        _IS_GOOGLE_COLAB_CHECKED = True
+        return _GOOGLE_COLAB_SECRET
+
+
+def _get_token_from_environment() -> Optional[str]:
+    # `HF_TOKEN` has priority (keep `HUGGING_FACE_HUB_TOKEN` for backward compatibility)
+    return _clean_token(os.environ.get("HF_TOKEN") or os.environ.get("HUGGING_FACE_HUB_TOKEN"))
+
+
+def _get_token_from_file() -> Optional[str]:
+    try:
+        return _clean_token(Path(constants.HF_TOKEN_PATH).read_text())
+    except FileNotFoundError:
+        return None
+
+
+def get_stored_tokens() -> Dict[str, str]:
+    """
+    Returns the parsed INI file containing the access tokens.
+    The file is located at `HF_STORED_TOKENS_PATH`, defaulting to `~/.cache/huggingface/stored_tokens`.
+    If the file does not exist, an empty dictionary is returned.
+
+    Returns: `Dict[str, str]`
+        Key is the token name and value is the token.
+    """
+    tokens_path = Path(constants.HF_STORED_TOKENS_PATH)
+    if not tokens_path.exists():
+        stored_tokens = {}
+    config = configparser.ConfigParser()
+    try:
+        config.read(tokens_path)
+        stored_tokens = {token_name: config.get(token_name, "hf_token") for token_name in config.sections()}
+    except configparser.Error as e:
+        logger.error(f"Error parsing stored tokens file: {e}")
+        stored_tokens = {}
+    return stored_tokens
+
+
+def _save_stored_tokens(stored_tokens: Dict[str, str]) -> None:
+    """
+    Saves the given configuration to the stored tokens file.
+
+    Args:
+        stored_tokens (`Dict[str, str]`):
+            The stored tokens to save. Key is the token name and value is the token.
+    """
+    stored_tokens_path = Path(constants.HF_STORED_TOKENS_PATH)
+
+    # Write the stored tokens into an INI file
+    config = configparser.ConfigParser()
+    for token_name in sorted(stored_tokens.keys()):
+        config.add_section(token_name)
+        config.set(token_name, "hf_token", stored_tokens[token_name])
+
+    stored_tokens_path.parent.mkdir(parents=True, exist_ok=True)
+    with stored_tokens_path.open("w") as config_file:
+        config.write(config_file)
+
+
+def _get_token_by_name(token_name: str) -> Optional[str]:
+    """
+    Get the token by name.
+
+    Args:
+        token_name (`str`):
+            The name of the token to get.
+
+    Returns:
+        `str` or `None`: The token, `None` if it doesn't exist.
+
+    """
+    stored_tokens = get_stored_tokens()
+    if token_name not in stored_tokens:
+        return None
+    return _clean_token(stored_tokens[token_name])
+
+
+def _save_token(token: str, token_name: str) -> None:
+    """
+    Save the given token.
+
+    If the stored tokens file does not exist, it will be created.
+    Args:
+        token (`str`):
+            The token to save.
+        token_name (`str`):
+            The name of the token.
+    """
+    tokens_path = Path(constants.HF_STORED_TOKENS_PATH)
+    stored_tokens = get_stored_tokens()
+    stored_tokens[token_name] = token
+    _save_stored_tokens(stored_tokens)
+    logger.info(f"The token `{token_name}` has been saved to {tokens_path}")
+
+
+def _clean_token(token: Optional[str]) -> Optional[str]:
+    """Clean token by removing trailing and leading spaces and newlines.
+
+    If token is an empty string, return None.
+    """
+    if token is None:
+        return None
+    return token.replace("\r", "").replace("\n", "").strip() or None
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_cache_assets.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_cache_assets.py
new file mode 100644
index 0000000000000000000000000000000000000000..e5d435df9b0bb0c67c0bcb5ef65711e9aef367f6
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_cache_assets.py
@@ -0,0 +1,135 @@
+# coding=utf-8
+# Copyright 2019-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from pathlib import Path
+from typing import Union
+
+from ..constants import HF_ASSETS_CACHE
+
+
+def cached_assets_path(
+    library_name: str,
+    namespace: str = "default",
+    subfolder: str = "default",
+    *,
+    assets_dir: Union[str, Path, None] = None,
+):
+    """Return a folder path to cache arbitrary files.
+
+    `huggingface_hub` provides a canonical folder path to store assets. This is the
+    recommended way to integrate cache in a downstream library as it will benefit from
+    the builtins tools to scan and delete the cache properly.
+
+    The distinction is made between files cached from the Hub and assets. Files from the
+    Hub are cached in a git-aware manner and entirely managed by `huggingface_hub`. See
+    [related documentation](https://huggingface.co/docs/huggingface_hub/how-to-cache).
+    All other files that a downstream library caches are considered to be "assets"
+    (files downloaded from external sources, extracted from a .tar archive, preprocessed
+    for training,...).
+
+    Once the folder path is generated, it is guaranteed to exist and to be a directory.
+    The path is based on 3 levels of depth: the library name, a namespace and a
+    subfolder. Those 3 levels grants flexibility while allowing `huggingface_hub` to
+    expect folders when scanning/deleting parts of the assets cache. Within a library,
+    it is expected that all namespaces share the same subset of subfolder names but this
+    is not a mandatory rule. The downstream library has then full control on which file
+    structure to adopt within its cache. Namespace and subfolder are optional (would
+    default to a `"default/"` subfolder) but library name is mandatory as we want every
+    downstream library to manage its own cache.
+
+    Expected tree:
+    ```text
+        assets/
+        └── datasets/
+        │   ├── SQuAD/
+        │   │   ├── downloaded/
+        │   │   ├── extracted/
+        │   │   └── processed/
+        │   ├── Helsinki-NLP--tatoeba_mt/
+        │       ├── downloaded/
+        │       ├── extracted/
+        │       └── processed/
+        └── transformers/
+            ├── default/
+            │   ├── something/
+            ├── bert-base-cased/
+            │   ├── default/
+            │   └── training/
+        hub/
+        └── models--julien-c--EsperBERTo-small/
+            ├── blobs/
+            │   ├── (...)
+            │   ├── (...)
+            ├── refs/
+            │   └── (...)
+            └── [ 128]  snapshots/
+                ├── 2439f60ef33a0d46d85da5001d52aeda5b00ce9f/
+                │   ├── (...)
+                └── bbc77c8132af1cc5cf678da3f1ddf2de43606d48/
+                    └── (...)
+    ```
+
+
+    Args:
+        library_name (`str`):
+            Name of the library that will manage the cache folder. Example: `"dataset"`.
+        namespace (`str`, *optional*, defaults to "default"):
+            Namespace to which the data belongs. Example: `"SQuAD"`.
+        subfolder (`str`, *optional*, defaults to "default"):
+            Subfolder in which the data will be stored. Example: `extracted`.
+        assets_dir (`str`, `Path`, *optional*):
+            Path to the folder where assets are cached. This must not be the same folder
+            where Hub files are cached. Defaults to `HF_HOME / "assets"` if not provided.
+            Can also be set with `HF_ASSETS_CACHE` environment variable.
+
+    Returns:
+        Path to the cache folder (`Path`).
+
+    Example:
+    ```py
+    >>> from huggingface_hub import cached_assets_path
+
+    >>> cached_assets_path(library_name="datasets", namespace="SQuAD", subfolder="download")
+    PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/SQuAD/download')
+
+    >>> cached_assets_path(library_name="datasets", namespace="SQuAD", subfolder="extracted")
+    PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/SQuAD/extracted')
+
+    >>> cached_assets_path(library_name="datasets", namespace="Helsinki-NLP/tatoeba_mt")
+    PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/Helsinki-NLP--tatoeba_mt/default')
+
+    >>> cached_assets_path(library_name="datasets", assets_dir="/tmp/tmp123456")
+    PosixPath('/tmp/tmp123456/datasets/default/default')
+    ```
+    """
+    # Resolve assets_dir
+    if assets_dir is None:
+        assets_dir = HF_ASSETS_CACHE
+    assets_dir = Path(assets_dir).expanduser().resolve()
+
+    # Avoid names that could create path issues
+    for part in (" ", "/", "\\"):
+        library_name = library_name.replace(part, "--")
+        namespace = namespace.replace(part, "--")
+        subfolder = subfolder.replace(part, "--")
+
+    # Path to subfolder is created
+    path = assets_dir / library_name / namespace / subfolder
+    try:
+        path.mkdir(exist_ok=True, parents=True)
+    except (FileExistsError, NotADirectoryError):
+        raise ValueError(f"Corrupted assets folder: cannot create directory because of an existing file ({path}).")
+
+    # Return
+    return path
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_cache_manager.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_cache_manager.py
new file mode 100644
index 0000000000000000000000000000000000000000..90d0e01f74812c5c3e65ba9313a155ee8e517927
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_cache_manager.py
@@ -0,0 +1,866 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains utilities to manage the HF cache directory."""
+
+import os
+import shutil
+import time
+from collections import defaultdict
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Dict, FrozenSet, List, Literal, Optional, Set, Union
+
+from huggingface_hub.errors import CacheNotFound, CorruptedCacheException
+
+from ..commands._cli_utils import tabulate
+from ..constants import HF_HUB_CACHE
+from . import logging
+
+
+logger = logging.get_logger(__name__)
+
+REPO_TYPE_T = Literal["model", "dataset", "space"]
+
+# List of OS-created helper files that need to be ignored
+FILES_TO_IGNORE = [".DS_Store"]
+
+
+@dataclass(frozen=True)
+class CachedFileInfo:
+    """Frozen data structure holding information about a single cached file.
+
+    Args:
+        file_name (`str`):
+            Name of the file. Example: `config.json`.
+        file_path (`Path`):
+            Path of the file in the `snapshots` directory. The file path is a symlink
+            referring to a blob in the `blobs` folder.
+        blob_path (`Path`):
+            Path of the blob file. This is equivalent to `file_path.resolve()`.
+        size_on_disk (`int`):
+            Size of the blob file in bytes.
+        blob_last_accessed (`float`):
+            Timestamp of the last time the blob file has been accessed (from any
+            revision).
+        blob_last_modified (`float`):
+            Timestamp of the last time the blob file has been modified/created.
+
+    > [!WARNING]
+    > `blob_last_accessed` and `blob_last_modified` reliability can depend on the OS you
+    > are using. See [python documentation](https://docs.python.org/3/library/os.html#os.stat_result)
+    > for more details.
+    """
+
+    file_name: str
+    file_path: Path
+    blob_path: Path
+    size_on_disk: int
+
+    blob_last_accessed: float
+    blob_last_modified: float
+
+    @property
+    def blob_last_accessed_str(self) -> str:
+        """
+        (property) Timestamp of the last time the blob file has been accessed (from any
+        revision), returned as a human-readable string.
+
+        Example: "2 weeks ago".
+        """
+        return _format_timesince(self.blob_last_accessed)
+
+    @property
+    def blob_last_modified_str(self) -> str:
+        """
+        (property) Timestamp of the last time the blob file has been modified, returned
+        as a human-readable string.
+
+        Example: "2 weeks ago".
+        """
+        return _format_timesince(self.blob_last_modified)
+
+    @property
+    def size_on_disk_str(self) -> str:
+        """
+        (property) Size of the blob file as a human-readable string.
+
+        Example: "42.2K".
+        """
+        return _format_size(self.size_on_disk)
+
+
+@dataclass(frozen=True)
+class CachedRevisionInfo:
+    """Frozen data structure holding information about a revision.
+
+    A revision correspond to a folder in the `snapshots` folder and is populated with
+    the exact tree structure as the repo on the Hub but contains only symlinks. A
+    revision can be either referenced by 1 or more `refs` or be "detached" (no refs).
+
+    Args:
+        commit_hash (`str`):
+            Hash of the revision (unique).
+            Example: `"9338f7b671827df886678df2bdd7cc7b4f36dffd"`.
+        snapshot_path (`Path`):
+            Path to the revision directory in the `snapshots` folder. It contains the
+            exact tree structure as the repo on the Hub.
+        files: (`FrozenSet[CachedFileInfo]`):
+            Set of [`~CachedFileInfo`] describing all files contained in the snapshot.
+        refs (`FrozenSet[str]`):
+            Set of `refs` pointing to this revision. If the revision has no `refs`, it
+            is considered detached.
+            Example: `{"main", "2.4.0"}` or `{"refs/pr/1"}`.
+        size_on_disk (`int`):
+            Sum of the blob file sizes that are symlink-ed by the revision.
+        last_modified (`float`):
+            Timestamp of the last time the revision has been created/modified.
+
+    > [!WARNING]
+    > `last_accessed` cannot be determined correctly on a single revision as blob files
+    > are shared across revisions.
+
+    > [!WARNING]
+    > `size_on_disk` is not necessarily the sum of all file sizes because of possible
+    > duplicated files. Besides, only blobs are taken into account, not the (negligible)
+    > size of folders and symlinks.
+    """
+
+    commit_hash: str
+    snapshot_path: Path
+    size_on_disk: int
+    files: FrozenSet[CachedFileInfo]
+    refs: FrozenSet[str]
+
+    last_modified: float
+
+    @property
+    def last_modified_str(self) -> str:
+        """
+        (property) Timestamp of the last time the revision has been modified, returned
+        as a human-readable string.
+
+        Example: "2 weeks ago".
+        """
+        return _format_timesince(self.last_modified)
+
+    @property
+    def size_on_disk_str(self) -> str:
+        """
+        (property) Sum of the blob file sizes as a human-readable string.
+
+        Example: "42.2K".
+        """
+        return _format_size(self.size_on_disk)
+
+    @property
+    def nb_files(self) -> int:
+        """
+        (property) Total number of files in the revision.
+        """
+        return len(self.files)
+
+
+@dataclass(frozen=True)
+class CachedRepoInfo:
+    """Frozen data structure holding information about a cached repository.
+
+    Args:
+        repo_id (`str`):
+            Repo id of the repo on the Hub. Example: `"google/fleurs"`.
+        repo_type (`Literal["dataset", "model", "space"]`):
+            Type of the cached repo.
+        repo_path (`Path`):
+            Local path to the cached repo.
+        size_on_disk (`int`):
+            Sum of the blob file sizes in the cached repo.
+        nb_files (`int`):
+            Total number of blob files in the cached repo.
+        revisions (`FrozenSet[CachedRevisionInfo]`):
+            Set of [`~CachedRevisionInfo`] describing all revisions cached in the repo.
+        last_accessed (`float`):
+            Timestamp of the last time a blob file of the repo has been accessed.
+        last_modified (`float`):
+            Timestamp of the last time a blob file of the repo has been modified/created.
+
+    > [!WARNING]
+    > `size_on_disk` is not necessarily the sum of all revisions sizes because of
+    > duplicated files. Besides, only blobs are taken into account, not the (negligible)
+    > size of folders and symlinks.
+
+    > [!WARNING]
+    > `last_accessed` and `last_modified` reliability can depend on the OS you are using.
+    > See [python documentation](https://docs.python.org/3/library/os.html#os.stat_result)
+    > for more details.
+    """
+
+    repo_id: str
+    repo_type: REPO_TYPE_T
+    repo_path: Path
+    size_on_disk: int
+    nb_files: int
+    revisions: FrozenSet[CachedRevisionInfo]
+
+    last_accessed: float
+    last_modified: float
+
+    @property
+    def last_accessed_str(self) -> str:
+        """
+        (property) Last time a blob file of the repo has been accessed, returned as a
+        human-readable string.
+
+        Example: "2 weeks ago".
+        """
+        return _format_timesince(self.last_accessed)
+
+    @property
+    def last_modified_str(self) -> str:
+        """
+        (property) Last time a blob file of the repo has been modified, returned as a
+        human-readable string.
+
+        Example: "2 weeks ago".
+        """
+        return _format_timesince(self.last_modified)
+
+    @property
+    def size_on_disk_str(self) -> str:
+        """
+        (property) Sum of the blob file sizes as a human-readable string.
+
+        Example: "42.2K".
+        """
+        return _format_size(self.size_on_disk)
+
+    @property
+    def refs(self) -> Dict[str, CachedRevisionInfo]:
+        """
+        (property) Mapping between `refs` and revision data structures.
+        """
+        return {ref: revision for revision in self.revisions for ref in revision.refs}
+
+
+@dataclass(frozen=True)
+class DeleteCacheStrategy:
+    """Frozen data structure holding the strategy to delete cached revisions.
+
+    This object is not meant to be instantiated programmatically but to be returned by
+    [`~utils.HFCacheInfo.delete_revisions`]. See documentation for usage example.
+
+    Args:
+        expected_freed_size (`float`):
+            Expected freed size once strategy is executed.
+        blobs (`FrozenSet[Path]`):
+            Set of blob file paths to be deleted.
+        refs (`FrozenSet[Path]`):
+            Set of reference file paths to be deleted.
+        repos (`FrozenSet[Path]`):
+            Set of entire repo paths to be deleted.
+        snapshots (`FrozenSet[Path]`):
+            Set of snapshots to be deleted (directory of symlinks).
+    """
+
+    expected_freed_size: int
+    blobs: FrozenSet[Path]
+    refs: FrozenSet[Path]
+    repos: FrozenSet[Path]
+    snapshots: FrozenSet[Path]
+
+    @property
+    def expected_freed_size_str(self) -> str:
+        """
+        (property) Expected size that will be freed as a human-readable string.
+
+        Example: "42.2K".
+        """
+        return _format_size(self.expected_freed_size)
+
+    def execute(self) -> None:
+        """Execute the defined strategy.
+
+        > [!WARNING]
+        > If this method is interrupted, the cache might get corrupted. Deletion order is
+        > implemented so that references and symlinks are deleted before the actual blob
+        > files.
+
+        > [!WARNING]
+        > This method is irreversible. If executed, cached files are erased and must be
+        > downloaded again.
+        """
+        # Deletion order matters. Blobs are deleted in last so that the user can't end
+        # up in a state where a `ref`` refers to a missing snapshot or a snapshot
+        # symlink refers to a deleted blob.
+
+        # Delete entire repos
+        for path in self.repos:
+            _try_delete_path(path, path_type="repo")
+
+        # Delete snapshot directories
+        for path in self.snapshots:
+            _try_delete_path(path, path_type="snapshot")
+
+        # Delete refs files
+        for path in self.refs:
+            _try_delete_path(path, path_type="ref")
+
+        # Delete blob files
+        for path in self.blobs:
+            _try_delete_path(path, path_type="blob")
+
+        logger.info(f"Cache deletion done. Saved {self.expected_freed_size_str}.")
+
+
+@dataclass(frozen=True)
+class HFCacheInfo:
+    """Frozen data structure holding information about the entire cache-system.
+
+    This data structure is returned by [`scan_cache_dir`] and is immutable.
+
+    Args:
+        size_on_disk (`int`):
+            Sum of all valid repo sizes in the cache-system.
+        repos (`FrozenSet[CachedRepoInfo]`):
+            Set of [`~CachedRepoInfo`] describing all valid cached repos found on the
+            cache-system while scanning.
+        warnings (`List[CorruptedCacheException]`):
+            List of [`~CorruptedCacheException`] that occurred while scanning the cache.
+            Those exceptions are captured so that the scan can continue. Corrupted repos
+            are skipped from the scan.
+
+    > [!WARNING]
+    > Here `size_on_disk` is equal to the sum of all repo sizes (only blobs). However if
+    > some cached repos are corrupted, their sizes are not taken into account.
+    """
+
+    size_on_disk: int
+    repos: FrozenSet[CachedRepoInfo]
+    warnings: List[CorruptedCacheException]
+
+    @property
+    def size_on_disk_str(self) -> str:
+        """
+        (property) Sum of all valid repo sizes in the cache-system as a human-readable
+        string.
+
+        Example: "42.2K".
+        """
+        return _format_size(self.size_on_disk)
+
+    def delete_revisions(self, *revisions: str) -> DeleteCacheStrategy:
+        """Prepare the strategy to delete one or more revisions cached locally.
+
+        Input revisions can be any revision hash. If a revision hash is not found in the
+        local cache, a warning is thrown but no error is raised. Revisions can be from
+        different cached repos since hashes are unique across repos,
+
+        Examples:
+        ```py
+        >>> from huggingface_hub import scan_cache_dir
+        >>> cache_info = scan_cache_dir()
+        >>> delete_strategy = cache_info.delete_revisions(
+        ...     "81fd1d6e7847c99f5862c9fb81387956d99ec7aa"
+        ... )
+        >>> print(f"Will free {delete_strategy.expected_freed_size_str}.")
+        Will free 7.9K.
+        >>> delete_strategy.execute()
+        Cache deletion done. Saved 7.9K.
+        ```
+
+        ```py
+        >>> from huggingface_hub import scan_cache_dir
+        >>> scan_cache_dir().delete_revisions(
+        ...     "81fd1d6e7847c99f5862c9fb81387956d99ec7aa",
+        ...     "e2983b237dccf3ab4937c97fa717319a9ca1a96d",
+        ...     "6c0e6080953db56375760c0471a8c5f2929baf11",
+        ... ).execute()
+        Cache deletion done. Saved 8.6G.
+        ```
+
+        > [!WARNING]
+        > `delete_revisions` returns a [`~utils.DeleteCacheStrategy`] object that needs to
+        > be executed. The [`~utils.DeleteCacheStrategy`] is not meant to be modified but
+        > allows having a dry run before actually executing the deletion.
+        """
+        hashes_to_delete: Set[str] = set(revisions)
+
+        repos_with_revisions: Dict[CachedRepoInfo, Set[CachedRevisionInfo]] = defaultdict(set)
+
+        for repo in self.repos:
+            for revision in repo.revisions:
+                if revision.commit_hash in hashes_to_delete:
+                    repos_with_revisions[repo].add(revision)
+                    hashes_to_delete.remove(revision.commit_hash)
+
+        if len(hashes_to_delete) > 0:
+            logger.warning(f"Revision(s) not found - cannot delete them: {', '.join(hashes_to_delete)}")
+
+        delete_strategy_blobs: Set[Path] = set()
+        delete_strategy_refs: Set[Path] = set()
+        delete_strategy_repos: Set[Path] = set()
+        delete_strategy_snapshots: Set[Path] = set()
+        delete_strategy_expected_freed_size = 0
+
+        for affected_repo, revisions_to_delete in repos_with_revisions.items():
+            other_revisions = affected_repo.revisions - revisions_to_delete
+
+            # If no other revisions, it means all revisions are deleted
+            # -> delete the entire cached repo
+            if len(other_revisions) == 0:
+                delete_strategy_repos.add(affected_repo.repo_path)
+                delete_strategy_expected_freed_size += affected_repo.size_on_disk
+                continue
+
+            # Some revisions of the repo will be deleted but not all. We need to filter
+            # which blob files will not be linked anymore.
+            for revision_to_delete in revisions_to_delete:
+                # Snapshot dir
+                delete_strategy_snapshots.add(revision_to_delete.snapshot_path)
+
+                # Refs dir
+                for ref in revision_to_delete.refs:
+                    delete_strategy_refs.add(affected_repo.repo_path / "refs" / ref)
+
+                # Blobs dir
+                for file in revision_to_delete.files:
+                    if file.blob_path not in delete_strategy_blobs:
+                        is_file_alone = True
+                        for revision in other_revisions:
+                            for rev_file in revision.files:
+                                if file.blob_path == rev_file.blob_path:
+                                    is_file_alone = False
+                                    break
+                            if not is_file_alone:
+                                break
+
+                        # Blob file not referenced by remaining revisions -> delete
+                        if is_file_alone:
+                            delete_strategy_blobs.add(file.blob_path)
+                            delete_strategy_expected_freed_size += file.size_on_disk
+
+        # Return the strategy instead of executing it.
+        return DeleteCacheStrategy(
+            blobs=frozenset(delete_strategy_blobs),
+            refs=frozenset(delete_strategy_refs),
+            repos=frozenset(delete_strategy_repos),
+            snapshots=frozenset(delete_strategy_snapshots),
+            expected_freed_size=delete_strategy_expected_freed_size,
+        )
+
+    def export_as_table(self, *, verbosity: int = 0) -> str:
+        """Generate a table from the [`HFCacheInfo`] object.
+
+        Pass `verbosity=0` to get a table with a single row per repo, with columns
+        "repo_id", "repo_type", "size_on_disk", "nb_files", "last_accessed", "last_modified", "refs", "local_path".
+
+        Pass `verbosity=1` to get a table with a row per repo and revision (thus multiple rows can appear for a single repo), with columns
+        "repo_id", "repo_type", "revision", "size_on_disk", "nb_files", "last_modified", "refs", "local_path".
+
+        Example:
+        ```py
+        >>> from huggingface_hub.utils import scan_cache_dir
+
+        >>> hf_cache_info = scan_cache_dir()
+        HFCacheInfo(...)
+
+        >>> print(hf_cache_info.export_as_table())
+        REPO ID                                             REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH
+        --------------------------------------------------- --------- ------------ -------- ------------- ------------- ---- --------------------------------------------------------------------------------------------------
+        roberta-base                                        model             2.7M        5 1 day ago     1 week ago    main ~/.cache/huggingface/hub/models--roberta-base
+        suno/bark                                           model             8.8K        1 1 week ago    1 week ago    main ~/.cache/huggingface/hub/models--suno--bark
+        t5-base                                             model           893.8M        4 4 days ago    7 months ago  main ~/.cache/huggingface/hub/models--t5-base
+        t5-large                                            model             3.0G        4 5 weeks ago   5 months ago  main ~/.cache/huggingface/hub/models--t5-large
+
+        >>> print(hf_cache_info.export_as_table(verbosity=1))
+        REPO ID                                             REPO TYPE REVISION                                 SIZE ON DISK NB FILES LAST_MODIFIED REFS LOCAL PATH
+        --------------------------------------------------- --------- ---------------------------------------- ------------ -------- ------------- ---- -----------------------------------------------------------------------------------------------------------------------------------------------------
+        roberta-base                                        model     e2da8e2f811d1448a5b465c236feacd80ffbac7b         2.7M        5 1 week ago    main ~/.cache/huggingface/hub/models--roberta-base/snapshots/e2da8e2f811d1448a5b465c236feacd80ffbac7b
+        suno/bark                                           model     70a8a7d34168586dc5d028fa9666aceade177992         8.8K        1 1 week ago    main ~/.cache/huggingface/hub/models--suno--bark/snapshots/70a8a7d34168586dc5d028fa9666aceade177992
+        t5-base                                             model     a9723ea7f1b39c1eae772870f3b547bf6ef7e6c1       893.8M        4 7 months ago  main ~/.cache/huggingface/hub/models--t5-base/snapshots/a9723ea7f1b39c1eae772870f3b547bf6ef7e6c1
+        t5-large                                            model     150ebc2c4b72291e770f58e6057481c8d2ed331a         3.0G        4 5 months ago  main ~/.cache/huggingface/hub/models--t5-large/snapshots/150ebc2c4b72291e770f58e6057481c8d2ed331a
+        ```
+
+        Args:
+            verbosity (`int`, *optional*):
+                The verbosity level. Defaults to 0.
+
+        Returns:
+            `str`: The table as a string.
+        """
+        if verbosity == 0:
+            return tabulate(
+                rows=[
+                    [
+                        repo.repo_id,
+                        repo.repo_type,
+                        "{:>12}".format(repo.size_on_disk_str),
+                        repo.nb_files,
+                        repo.last_accessed_str,
+                        repo.last_modified_str,
+                        ", ".join(sorted(repo.refs)),
+                        str(repo.repo_path),
+                    ]
+                    for repo in sorted(self.repos, key=lambda repo: repo.repo_path)
+                ],
+                headers=[
+                    "REPO ID",
+                    "REPO TYPE",
+                    "SIZE ON DISK",
+                    "NB FILES",
+                    "LAST_ACCESSED",
+                    "LAST_MODIFIED",
+                    "REFS",
+                    "LOCAL PATH",
+                ],
+            )
+        else:
+            return tabulate(
+                rows=[
+                    [
+                        repo.repo_id,
+                        repo.repo_type,
+                        revision.commit_hash,
+                        "{:>12}".format(revision.size_on_disk_str),
+                        revision.nb_files,
+                        revision.last_modified_str,
+                        ", ".join(sorted(revision.refs)),
+                        str(revision.snapshot_path),
+                    ]
+                    for repo in sorted(self.repos, key=lambda repo: repo.repo_path)
+                    for revision in sorted(repo.revisions, key=lambda revision: revision.commit_hash)
+                ],
+                headers=[
+                    "REPO ID",
+                    "REPO TYPE",
+                    "REVISION",
+                    "SIZE ON DISK",
+                    "NB FILES",
+                    "LAST_MODIFIED",
+                    "REFS",
+                    "LOCAL PATH",
+                ],
+            )
+
+
+def scan_cache_dir(cache_dir: Optional[Union[str, Path]] = None) -> HFCacheInfo:
+    """Scan the entire HF cache-system and return a [`~HFCacheInfo`] structure.
+
+    Use `scan_cache_dir` in order to programmatically scan your cache-system. The cache
+    will be scanned repo by repo. If a repo is corrupted, a [`~CorruptedCacheException`]
+    will be thrown internally but captured and returned in the [`~HFCacheInfo`]
+    structure. Only valid repos get a proper report.
+
+    ```py
+    >>> from huggingface_hub import scan_cache_dir
+
+    >>> hf_cache_info = scan_cache_dir()
+    HFCacheInfo(
+        size_on_disk=3398085269,
+        repos=frozenset({
+            CachedRepoInfo(
+                repo_id='t5-small',
+                repo_type='model',
+                repo_path=PosixPath(...),
+                size_on_disk=970726914,
+                nb_files=11,
+                revisions=frozenset({
+                    CachedRevisionInfo(
+                        commit_hash='d78aea13fa7ecd06c29e3e46195d6341255065d5',
+                        size_on_disk=970726339,
+                        snapshot_path=PosixPath(...),
+                        files=frozenset({
+                            CachedFileInfo(
+                                file_name='config.json',
+                                size_on_disk=1197
+                                file_path=PosixPath(...),
+                                blob_path=PosixPath(...),
+                            ),
+                            CachedFileInfo(...),
+                            ...
+                        }),
+                    ),
+                    CachedRevisionInfo(...),
+                    ...
+                }),
+            ),
+            CachedRepoInfo(...),
+            ...
+        }),
+        warnings=[
+            CorruptedCacheException("Snapshots dir doesn't exist in cached repo: ..."),
+            CorruptedCacheException(...),
+            ...
+        ],
+    )
+    ```
+
+    You can also print a detailed report directly from the `hf` command line using:
+    ```text
+    > hf cache scan
+    REPO ID                     REPO TYPE SIZE ON DISK NB FILES REFS                LOCAL PATH
+    --------------------------- --------- ------------ -------- ------------------- -------------------------------------------------------------------------
+    glue                        dataset         116.3K       15 1.17.0, main, 2.4.0 /Users/lucain/.cache/huggingface/hub/datasets--glue
+    google/fleurs               dataset          64.9M        6 main, refs/pr/1     /Users/lucain/.cache/huggingface/hub/datasets--google--fleurs
+    Jean-Baptiste/camembert-ner model           441.0M        7 main                /Users/lucain/.cache/huggingface/hub/models--Jean-Baptiste--camembert-ner
+    bert-base-cased             model             1.9G       13 main                /Users/lucain/.cache/huggingface/hub/models--bert-base-cased
+    t5-base                     model            10.1K        3 main                /Users/lucain/.cache/huggingface/hub/models--t5-base
+    t5-small                    model           970.7M       11 refs/pr/1, main     /Users/lucain/.cache/huggingface/hub/models--t5-small
+
+    Done in 0.0s. Scanned 6 repo(s) for a total of 3.4G.
+    Got 1 warning(s) while scanning. Use -vvv to print details.
+    ```
+
+    Args:
+        cache_dir (`str` or `Path`, `optional`):
+            Cache directory to cache. Defaults to the default HF cache directory.
+
+    > [!WARNING]
+    > Raises:
+    >
+    >     `CacheNotFound`
+    >       If the cache directory does not exist.
+    >
+    >     [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
+    >       If the cache directory is a file, instead of a directory.
+
+    Returns: a [`~HFCacheInfo`] object.
+    """
+    if cache_dir is None:
+        cache_dir = HF_HUB_CACHE
+
+    cache_dir = Path(cache_dir).expanduser().resolve()
+    if not cache_dir.exists():
+        raise CacheNotFound(
+            f"Cache directory not found: {cache_dir}. Please use `cache_dir` argument or set `HF_HUB_CACHE` environment variable.",
+            cache_dir=cache_dir,
+        )
+
+    if cache_dir.is_file():
+        raise ValueError(
+            f"Scan cache expects a directory but found a file: {cache_dir}. Please use `cache_dir` argument or set `HF_HUB_CACHE` environment variable."
+        )
+
+    repos: Set[CachedRepoInfo] = set()
+    warnings: List[CorruptedCacheException] = []
+    for repo_path in cache_dir.iterdir():
+        if repo_path.name == ".locks":  # skip './.locks/' folder
+            continue
+        try:
+            repos.add(_scan_cached_repo(repo_path))
+        except CorruptedCacheException as e:
+            warnings.append(e)
+
+    return HFCacheInfo(
+        repos=frozenset(repos),
+        size_on_disk=sum(repo.size_on_disk for repo in repos),
+        warnings=warnings,
+    )
+
+
+def _scan_cached_repo(repo_path: Path) -> CachedRepoInfo:
+    """Scan a single cache repo and return information about it.
+
+    Any unexpected behavior will raise a [`~CorruptedCacheException`].
+    """
+    if not repo_path.is_dir():
+        raise CorruptedCacheException(f"Repo path is not a directory: {repo_path}")
+
+    if "--" not in repo_path.name:
+        raise CorruptedCacheException(f"Repo path is not a valid HuggingFace cache directory: {repo_path}")
+
+    repo_type, repo_id = repo_path.name.split("--", maxsplit=1)
+    repo_type = repo_type[:-1]  # "models" -> "model"
+    repo_id = repo_id.replace("--", "/")  # google/fleurs -> "google/fleurs"
+
+    if repo_type not in {"dataset", "model", "space"}:
+        raise CorruptedCacheException(
+            f"Repo type must be `dataset`, `model` or `space`, found `{repo_type}` ({repo_path})."
+        )
+
+    blob_stats: Dict[Path, os.stat_result] = {}  # Key is blob_path, value is blob stats
+
+    snapshots_path = repo_path / "snapshots"
+    refs_path = repo_path / "refs"
+
+    if not snapshots_path.exists() or not snapshots_path.is_dir():
+        raise CorruptedCacheException(f"Snapshots dir doesn't exist in cached repo: {snapshots_path}")
+
+    # Scan over `refs` directory
+
+    # key is revision hash, value is set of refs
+    refs_by_hash: Dict[str, Set[str]] = defaultdict(set)
+    if refs_path.exists():
+        # Example of `refs` directory
+        # ── refs
+        #     ├── main
+        #     └── refs
+        #         └── pr
+        #             └── 1
+        if refs_path.is_file():
+            raise CorruptedCacheException(f"Refs directory cannot be a file: {refs_path}")
+
+        for ref_path in refs_path.glob("**/*"):
+            # glob("**/*") iterates over all files and directories -> skip directories
+            if ref_path.is_dir() or ref_path.name in FILES_TO_IGNORE:
+                continue
+
+            ref_name = str(ref_path.relative_to(refs_path))
+            with ref_path.open() as f:
+                commit_hash = f.read()
+
+            refs_by_hash[commit_hash].add(ref_name)
+
+    # Scan snapshots directory
+    cached_revisions: Set[CachedRevisionInfo] = set()
+    for revision_path in snapshots_path.iterdir():
+        # Ignore OS-created helper files
+        if revision_path.name in FILES_TO_IGNORE:
+            continue
+        if revision_path.is_file():
+            raise CorruptedCacheException(f"Snapshots folder corrupted. Found a file: {revision_path}")
+
+        cached_files = set()
+        for file_path in revision_path.glob("**/*"):
+            # glob("**/*") iterates over all files and directories -> skip directories
+            if file_path.is_dir():
+                continue
+
+            blob_path = Path(file_path).resolve()
+            if not blob_path.exists():
+                raise CorruptedCacheException(f"Blob missing (broken symlink): {blob_path}")
+
+            if blob_path not in blob_stats:
+                blob_stats[blob_path] = blob_path.stat()
+
+            cached_files.add(
+                CachedFileInfo(
+                    file_name=file_path.name,
+                    file_path=file_path,
+                    size_on_disk=blob_stats[blob_path].st_size,
+                    blob_path=blob_path,
+                    blob_last_accessed=blob_stats[blob_path].st_atime,
+                    blob_last_modified=blob_stats[blob_path].st_mtime,
+                )
+            )
+
+        # Last modified is either the last modified blob file or the revision folder
+        # itself if it is empty
+        if len(cached_files) > 0:
+            revision_last_modified = max(blob_stats[file.blob_path].st_mtime for file in cached_files)
+        else:
+            revision_last_modified = revision_path.stat().st_mtime
+
+        cached_revisions.add(
+            CachedRevisionInfo(
+                commit_hash=revision_path.name,
+                files=frozenset(cached_files),
+                refs=frozenset(refs_by_hash.pop(revision_path.name, set())),
+                size_on_disk=sum(
+                    blob_stats[blob_path].st_size for blob_path in set(file.blob_path for file in cached_files)
+                ),
+                snapshot_path=revision_path,
+                last_modified=revision_last_modified,
+            )
+        )
+
+    # Check that all refs referred to an existing revision
+    if len(refs_by_hash) > 0:
+        raise CorruptedCacheException(
+            f"Reference(s) refer to missing commit hashes: {dict(refs_by_hash)} ({repo_path})."
+        )
+
+    # Last modified is either the last modified blob file or the repo folder itself if
+    # no blob files has been found. Same for last accessed.
+    if len(blob_stats) > 0:
+        repo_last_accessed = max(stat.st_atime for stat in blob_stats.values())
+        repo_last_modified = max(stat.st_mtime for stat in blob_stats.values())
+    else:
+        repo_stats = repo_path.stat()
+        repo_last_accessed = repo_stats.st_atime
+        repo_last_modified = repo_stats.st_mtime
+
+    # Build and return frozen structure
+    return CachedRepoInfo(
+        nb_files=len(blob_stats),
+        repo_id=repo_id,
+        repo_path=repo_path,
+        repo_type=repo_type,  # type: ignore
+        revisions=frozenset(cached_revisions),
+        size_on_disk=sum(stat.st_size for stat in blob_stats.values()),
+        last_accessed=repo_last_accessed,
+        last_modified=repo_last_modified,
+    )
+
+
+def _format_size(num: int) -> str:
+    """Format size in bytes into a human-readable string.
+
+    Taken from https://stackoverflow.com/a/1094933
+    """
+    num_f = float(num)
+    for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]:
+        if abs(num_f) < 1000.0:
+            return f"{num_f:3.1f}{unit}"
+        num_f /= 1000.0
+    return f"{num_f:.1f}Y"
+
+
+_TIMESINCE_CHUNKS = (
+    # Label, divider, max value
+    ("second", 1, 60),
+    ("minute", 60, 60),
+    ("hour", 60 * 60, 24),
+    ("day", 60 * 60 * 24, 6),
+    ("week", 60 * 60 * 24 * 7, 6),
+    ("month", 60 * 60 * 24 * 30, 11),
+    ("year", 60 * 60 * 24 * 365, None),
+)
+
+
+def _format_timesince(ts: float) -> str:
+    """Format timestamp in seconds into a human-readable string, relative to now.
+
+    Vaguely inspired by Django's `timesince` formatter.
+    """
+    delta = time.time() - ts
+    if delta < 20:
+        return "a few seconds ago"
+    for label, divider, max_value in _TIMESINCE_CHUNKS:  # noqa: B007
+        value = round(delta / divider)
+        if max_value is not None and value <= max_value:
+            break
+    return f"{value} {label}{'s' if value > 1 else ''} ago"
+
+
+def _try_delete_path(path: Path, path_type: str) -> None:
+    """Try to delete a local file or folder.
+
+    If the path does not exists, error is logged as a warning and then ignored.
+
+    Args:
+        path (`Path`)
+            Path to delete. Can be a file or a folder.
+        path_type (`str`)
+            What path are we deleting ? Only for logging purposes. Example: "snapshot".
+    """
+    logger.info(f"Delete {path_type}: {path}")
+    try:
+        if path.is_file():
+            os.remove(path)
+        else:
+            shutil.rmtree(path)
+    except FileNotFoundError:
+        logger.warning(f"Couldn't delete {path_type}: file not found ({path})", exc_info=True)
+    except PermissionError:
+        logger.warning(f"Couldn't delete {path_type}: permission denied ({path})", exc_info=True)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_chunk_utils.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_chunk_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..fe8ecc9c94f9c09503761e734a005124d3291a52
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_chunk_utils.py
@@ -0,0 +1,64 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains a utility to iterate by chunks over an iterator."""
+
+import itertools
+from typing import Iterable, TypeVar
+
+
+T = TypeVar("T")
+
+
+def chunk_iterable(iterable: Iterable[T], chunk_size: int) -> Iterable[Iterable[T]]:
+    """Iterates over an iterator chunk by chunk.
+
+    Taken from https://stackoverflow.com/a/8998040.
+    See also https://github.com/huggingface/huggingface_hub/pull/920#discussion_r938793088.
+
+    Args:
+        iterable (`Iterable`):
+            The iterable on which we want to iterate.
+        chunk_size (`int`):
+            Size of the chunks. Must be a strictly positive integer (e.g. >0).
+
+    Example:
+
+    ```python
+    >>> from huggingface_hub.utils import chunk_iterable
+
+    >>> for items in chunk_iterable(range(17), chunk_size=8):
+    ...     print(items)
+    # [0, 1, 2, 3, 4, 5, 6, 7]
+    # [8, 9, 10, 11, 12, 13, 14, 15]
+    # [16] # smaller last chunk
+    ```
+
+    Raises:
+        [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
+            If `chunk_size` <= 0.
+
+    > [!WARNING]
+    > The last chunk can be smaller than `chunk_size`.
+    """
+    if not isinstance(chunk_size, int) or chunk_size <= 0:
+        raise ValueError("`chunk_size` must be a strictly positive integer (>0).")
+
+    iterator = iter(iterable)
+    while True:
+        try:
+            next_item = next(iterator)
+        except StopIteration:
+            return
+        yield itertools.chain((next_item,), itertools.islice(iterator, chunk_size - 1))
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_datetime.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_datetime.py
new file mode 100644
index 0000000000000000000000000000000000000000..1a7f44285d1c826006c97176ca66c3e9c33f61c0
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_datetime.py
@@ -0,0 +1,67 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains utilities to handle datetimes in Huggingface Hub."""
+
+from datetime import datetime, timezone
+
+
+def parse_datetime(date_string: str) -> datetime:
+    """
+    Parses a date_string returned from the server to a datetime object.
+
+    This parser is a weak-parser is the sense that it handles only a single format of
+    date_string. It is expected that the server format will never change. The
+    implementation depends only on the standard lib to avoid an external dependency
+    (python-dateutil). See full discussion about this decision on PR:
+    https://github.com/huggingface/huggingface_hub/pull/999.
+
+    Example:
+        ```py
+        > parse_datetime('2022-08-19T07:19:38.123Z')
+        datetime.datetime(2022, 8, 19, 7, 19, 38, 123000, tzinfo=timezone.utc)
+        ```
+
+    Args:
+        date_string (`str`):
+            A string representing a datetime returned by the Hub server.
+            String is expected to follow '%Y-%m-%dT%H:%M:%S.%fZ' pattern.
+
+    Returns:
+        A python datetime object.
+
+    Raises:
+        :class:`ValueError`:
+            If `date_string` cannot be parsed.
+    """
+    try:
+        # Normalize the string to always have 6 digits of fractional seconds
+        if date_string.endswith("Z"):
+            # Case 1: No decimal point (e.g., "2024-11-16T00:27:02Z")
+            if "." not in date_string:
+                # No fractional seconds - insert .000000
+                date_string = date_string[:-1] + ".000000Z"
+            # Case 2: Has decimal point (e.g., "2022-08-19T07:19:38.123456789Z")
+            else:
+                # Get the fractional and base parts
+                base, fraction = date_string[:-1].split(".")
+                # fraction[:6] takes first 6 digits and :0<6 pads with zeros if less than 6 digits
+                date_string = f"{base}.{fraction[:6]:0<6}Z"
+
+        return datetime.strptime(date_string, "%Y-%m-%dT%H:%M:%S.%fZ").replace(tzinfo=timezone.utc)
+    except ValueError as e:
+        raise ValueError(
+            f"Cannot parse '{date_string}' as a datetime. Date string is expected to"
+            " follow '%Y-%m-%dT%H:%M:%S.%fZ' pattern."
+        ) from e
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py
new file mode 100644
index 0000000000000000000000000000000000000000..4cb8d6e418c76accd1ecd61158b4bdd265e12f71
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py
@@ -0,0 +1,136 @@
+import warnings
+from functools import wraps
+from inspect import Parameter, signature
+from typing import Iterable, Optional
+
+
+def _deprecate_positional_args(*, version: str):
+    """Decorator for methods that issues warnings for positional arguments.
+    Using the keyword-only argument syntax in pep 3102, arguments after the
+    * will issue a warning when passed as a positional argument.
+
+    Args:
+        version (`str`):
+            The version when positional arguments will result in error.
+    """
+
+    def _inner_deprecate_positional_args(f):
+        sig = signature(f)
+        kwonly_args = []
+        all_args = []
+        for name, param in sig.parameters.items():
+            if param.kind == Parameter.POSITIONAL_OR_KEYWORD:
+                all_args.append(name)
+            elif param.kind == Parameter.KEYWORD_ONLY:
+                kwonly_args.append(name)
+
+        @wraps(f)
+        def inner_f(*args, **kwargs):
+            extra_args = len(args) - len(all_args)
+            if extra_args <= 0:
+                return f(*args, **kwargs)
+            # extra_args > 0
+            args_msg = [
+                f"{name}='{arg}'" if isinstance(arg, str) else f"{name}={arg}"
+                for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:])
+            ]
+            args_msg = ", ".join(args_msg)
+            warnings.warn(
+                f"Deprecated positional argument(s) used in '{f.__name__}': pass"
+                f" {args_msg} as keyword args. From version {version} passing these"
+                " as positional arguments will result in an error,",
+                FutureWarning,
+            )
+            kwargs.update(zip(sig.parameters, args))
+            return f(**kwargs)
+
+        return inner_f
+
+    return _inner_deprecate_positional_args
+
+
+def _deprecate_arguments(
+    *,
+    version: str,
+    deprecated_args: Iterable[str],
+    custom_message: Optional[str] = None,
+):
+    """Decorator to issue warnings when using deprecated arguments.
+
+    TODO: could be useful to be able to set a custom error message.
+
+    Args:
+        version (`str`):
+            The version when deprecated arguments will result in error.
+        deprecated_args (`List[str]`):
+            List of the arguments to be deprecated.
+        custom_message (`str`, *optional*):
+            Warning message that is raised. If not passed, a default warning message
+            will be created.
+    """
+
+    def _inner_deprecate_positional_args(f):
+        sig = signature(f)
+
+        @wraps(f)
+        def inner_f(*args, **kwargs):
+            # Check for used deprecated arguments
+            used_deprecated_args = []
+            for _, parameter in zip(args, sig.parameters.values()):
+                if parameter.name in deprecated_args:
+                    used_deprecated_args.append(parameter.name)
+            for kwarg_name, kwarg_value in kwargs.items():
+                if (
+                    # If argument is deprecated but still used
+                    kwarg_name in deprecated_args
+                    # And then the value is not the default value
+                    and kwarg_value != sig.parameters[kwarg_name].default
+                ):
+                    used_deprecated_args.append(kwarg_name)
+
+            # Warn and proceed
+            if len(used_deprecated_args) > 0:
+                message = (
+                    f"Deprecated argument(s) used in '{f.__name__}':"
+                    f" {', '.join(used_deprecated_args)}. Will not be supported from"
+                    f" version '{version}'."
+                )
+                if custom_message is not None:
+                    message += "\n\n" + custom_message
+                warnings.warn(message, FutureWarning)
+            return f(*args, **kwargs)
+
+        return inner_f
+
+    return _inner_deprecate_positional_args
+
+
+def _deprecate_method(*, version: str, message: Optional[str] = None):
+    """Decorator to issue warnings when using a deprecated method.
+
+    Args:
+        version (`str`):
+            The version when deprecated arguments will result in error.
+        message (`str`, *optional*):
+            Warning message that is raised. If not passed, a default warning message
+            will be created.
+    """
+
+    def _inner_deprecate_method(f):
+        name = f.__name__
+        if name == "__init__":
+            name = f.__qualname__.split(".")[0]  # class name instead of method name
+
+        @wraps(f)
+        def inner_f(*args, **kwargs):
+            warning_message = (
+                f"'{name}' (from '{f.__module__}') is deprecated and will be removed from version '{version}'."
+            )
+            if message is not None:
+                warning_message += " " + message
+            warnings.warn(warning_message, FutureWarning)
+            return f(*args, **kwargs)
+
+        return inner_f
+
+    return _inner_deprecate_method
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_dotenv.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_dotenv.py
new file mode 100644
index 0000000000000000000000000000000000000000..23b8a1b70a4827fc8ae4149c2b1b1e4b00ed7ca2
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_dotenv.py
@@ -0,0 +1,55 @@
+# AI-generated module (ChatGPT)
+import re
+from typing import Dict, Optional
+
+
+def load_dotenv(dotenv_str: str, environ: Optional[Dict[str, str]] = None) -> Dict[str, str]:
+    """
+    Parse a DOTENV-format string and return a dictionary of key-value pairs.
+    Handles quoted values, comments, export keyword, and blank lines.
+    """
+    env: Dict[str, str] = {}
+    line_pattern = re.compile(
+        r"""
+        ^\s*
+        (?:export[^\S\n]+)?               # optional export
+        ([A-Za-z_][A-Za-z0-9_]*)          # key
+        [^\S\n]*(=)?[^\S\n]*
+        (                                 # value group
+            (?:
+                '(?:\\'|[^'])*'           # single-quoted value
+                | \"(?:\\\"|[^\"])*\"     # double-quoted value
+                | [^#\n\r]+?              # unquoted value
+            )
+        )?
+        [^\S\n]*(?:\#.*)?$                # optional inline comment
+    """,
+        re.VERBOSE,
+    )
+
+    for line in dotenv_str.splitlines():
+        line = line.strip()
+        if not line or line.startswith("#"):
+            continue  # Skip comments and empty lines
+
+        match = line_pattern.match(line)
+        if match:
+            key = match.group(1)
+            val = None
+            if match.group(2):  # if there is '='
+                raw_val = match.group(3) or ""
+                val = raw_val.strip()
+                # Remove surrounding quotes if quoted
+                if (val.startswith('"') and val.endswith('"')) or (val.startswith("'") and val.endswith("'")):
+                    val = val[1:-1]
+                    val = val.replace(r"\n", "\n").replace(r"\t", "\t").replace(r"\"", '"').replace(r"\\", "\\")
+                    if raw_val.startswith('"'):
+                        val = val.replace(r"\$", "$")  # only in double quotes
+            elif environ is not None:
+                # Get it from the current environment
+                val = environ.get(key)
+
+            if val is not None:
+                env[key] = val
+
+    return env
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_experimental.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_experimental.py
new file mode 100644
index 0000000000000000000000000000000000000000..40b0ed90ff8af6797758d59b93019498cd72f9ad
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_experimental.py
@@ -0,0 +1,68 @@
+# coding=utf-8
+# Copyright 2023-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains utilities to flag a feature as "experimental" in Huggingface Hub."""
+
+import warnings
+from functools import wraps
+from typing import Callable
+
+from .. import constants
+
+
+def experimental(fn: Callable) -> Callable:
+    """Decorator to flag a feature as experimental.
+
+    An experimental feature triggers a warning when used as it might be subject to breaking changes without prior notice
+    in the future.
+
+    Warnings can be disabled by setting `HF_HUB_DISABLE_EXPERIMENTAL_WARNING=1` as environment variable.
+
+    Args:
+        fn (`Callable`):
+            The function to flag as experimental.
+
+    Returns:
+        `Callable`: The decorated function.
+
+    Example:
+
+    ```python
+    >>> from huggingface_hub.utils import experimental
+
+    >>> @experimental
+    ... def my_function():
+    ...     print("Hello world!")
+
+    >>> my_function()
+    UserWarning: 'my_function' is experimental and might be subject to breaking changes in the future without prior
+    notice. You can disable this warning by setting `HF_HUB_DISABLE_EXPERIMENTAL_WARNING=1` as environment variable.
+    Hello world!
+    ```
+    """
+    # For classes, put the "experimental" around the "__new__" method => __new__ will be removed in warning message
+    name = fn.__qualname__[: -len(".__new__")] if fn.__qualname__.endswith(".__new__") else fn.__qualname__
+
+    @wraps(fn)
+    def _inner_fn(*args, **kwargs):
+        if not constants.HF_HUB_DISABLE_EXPERIMENTAL_WARNING:
+            warnings.warn(
+                f"'{name}' is experimental and might be subject to breaking changes in the future without prior notice."
+                " You can disable this warning by setting `HF_HUB_DISABLE_EXPERIMENTAL_WARNING=1` as environment"
+                " variable.",
+                UserWarning,
+            )
+        return fn(*args, **kwargs)
+
+    return _inner_fn
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_fixes.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_fixes.py
new file mode 100644
index 0000000000000000000000000000000000000000..560003b6222058b03791491b1ce70ea9d7a94404
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_fixes.py
@@ -0,0 +1,133 @@
+# JSONDecodeError was introduced in requests=2.27 released in 2022.
+# This allows us to support older requests for users
+# More information: https://github.com/psf/requests/pull/5856
+try:
+    from requests import JSONDecodeError  # type: ignore  # noqa: F401
+except ImportError:
+    try:
+        from simplejson import JSONDecodeError  # type: ignore # noqa: F401
+    except ImportError:
+        from json import JSONDecodeError  # type: ignore  # noqa: F401
+import contextlib
+import os
+import shutil
+import stat
+import tempfile
+import time
+from functools import partial
+from pathlib import Path
+from typing import Callable, Generator, Optional, Union
+
+import yaml
+from filelock import BaseFileLock, FileLock, SoftFileLock, Timeout
+
+from .. import constants
+from . import logging
+
+
+logger = logging.get_logger(__name__)
+
+# Wrap `yaml.dump` to set `allow_unicode=True` by default.
+#
+# Example:
+# ```py
+# >>> yaml.dump({"emoji": "👀", "some unicode": "日本か"})
+# 'emoji: "\\U0001F440"\nsome unicode: "\\u65E5\\u672C\\u304B"\n'
+#
+# >>> yaml_dump({"emoji": "👀", "some unicode": "日本か"})
+# 'emoji: "👀"\nsome unicode: "日本か"\n'
+# ```
+yaml_dump: Callable[..., str] = partial(yaml.dump, stream=None, allow_unicode=True)  # type: ignore
+
+
+@contextlib.contextmanager
+def SoftTemporaryDirectory(
+    suffix: Optional[str] = None,
+    prefix: Optional[str] = None,
+    dir: Optional[Union[Path, str]] = None,
+    **kwargs,
+) -> Generator[Path, None, None]:
+    """
+    Context manager to create a temporary directory and safely delete it.
+
+    If tmp directory cannot be deleted normally, we set the WRITE permission and retry.
+    If cleanup still fails, we give up but don't raise an exception. This is equivalent
+    to  `tempfile.TemporaryDirectory(..., ignore_cleanup_errors=True)` introduced in
+    Python 3.10.
+
+    See https://www.scivision.dev/python-tempfile-permission-error-windows/.
+    """
+    tmpdir = tempfile.TemporaryDirectory(prefix=prefix, suffix=suffix, dir=dir, **kwargs)
+    yield Path(tmpdir.name).resolve()
+
+    try:
+        # First once with normal cleanup
+        shutil.rmtree(tmpdir.name)
+    except Exception:
+        # If failed, try to set write permission and retry
+        try:
+            shutil.rmtree(tmpdir.name, onerror=_set_write_permission_and_retry)
+        except Exception:
+            pass
+
+    # And finally, cleanup the tmpdir.
+    # If it fails again, give up but do not throw error
+    try:
+        tmpdir.cleanup()
+    except Exception:
+        pass
+
+
+def _set_write_permission_and_retry(func, path, excinfo):
+    os.chmod(path, stat.S_IWRITE)
+    func(path)
+
+
+@contextlib.contextmanager
+def WeakFileLock(
+    lock_file: Union[str, Path], *, timeout: Optional[float] = None
+) -> Generator[BaseFileLock, None, None]:
+    """A filelock with some custom logic.
+
+    This filelock is weaker than the default filelock in that:
+    1. It won't raise an exception if release fails.
+    2. It will default to a SoftFileLock if the filesystem does not support flock.
+
+    An INFO log message is emitted every 10 seconds if the lock is not acquired immediately.
+    If a timeout is provided, a `filelock.Timeout` exception is raised if the lock is not acquired within the timeout.
+    """
+    log_interval = constants.FILELOCK_LOG_EVERY_SECONDS
+    lock = FileLock(lock_file, timeout=log_interval)
+    start_time = time.time()
+
+    while True:
+        elapsed_time = time.time() - start_time
+        if timeout is not None and elapsed_time >= timeout:
+            raise Timeout(str(lock_file))
+
+        try:
+            lock.acquire(timeout=min(log_interval, timeout - elapsed_time) if timeout else log_interval)
+        except Timeout:
+            logger.info(
+                f"Still waiting to acquire lock on {lock_file} (elapsed: {time.time() - start_time:.1f} seconds)"
+            )
+        except NotImplementedError as e:
+            if "use SoftFileLock instead" in str(e):
+                logger.warning(
+                    "FileSystem does not appear to support flock. Falling back to SoftFileLock for %s", lock_file
+                )
+                lock = SoftFileLock(lock_file, timeout=log_interval)
+                continue
+        else:
+            break
+
+    try:
+        yield lock
+    finally:
+        try:
+            lock.release()
+        except OSError:
+            try:
+                Path(lock_file).unlink()
+            except OSError:
+                pass
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_git_credential.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_git_credential.py
new file mode 100644
index 0000000000000000000000000000000000000000..5ad84648a0093de6e6defc178e4ffffe985f50e4
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_git_credential.py
@@ -0,0 +1,121 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains utilities to manage Git credentials."""
+
+import re
+import subprocess
+from typing import List, Optional
+
+from ..constants import ENDPOINT
+from ._subprocess import run_interactive_subprocess, run_subprocess
+
+
+GIT_CREDENTIAL_REGEX = re.compile(
+    r"""
+        ^\s* # start of line
+        credential\.helper # credential.helper value
+        \s*=\s* # separator
+        ([\w\-\/]+) # the helper name or absolute path (group 1)
+        (\s|$) # whitespace or end of line
+    """,
+    flags=re.MULTILINE | re.IGNORECASE | re.VERBOSE,
+)
+
+
+def list_credential_helpers(folder: Optional[str] = None) -> List[str]:
+    """Return the list of git credential helpers configured.
+
+    See https://git-scm.com/docs/gitcredentials.
+
+    Credentials are saved in all configured helpers (store, cache, macOS keychain,...).
+    Calls "`git credential approve`" internally. See https://git-scm.com/docs/git-credential.
+
+    Args:
+        folder (`str`, *optional*):
+            The folder in which to check the configured helpers.
+    """
+    try:
+        output = run_subprocess("git config --list", folder=folder).stdout
+        parsed = _parse_credential_output(output)
+        return parsed
+    except subprocess.CalledProcessError as exc:
+        raise EnvironmentError(exc.stderr)
+
+
+def set_git_credential(token: str, username: str = "hf_user", folder: Optional[str] = None) -> None:
+    """Save a username/token pair in git credential for HF Hub registry.
+
+    Credentials are saved in all configured helpers (store, cache, macOS keychain,...).
+    Calls "`git credential approve`" internally. See https://git-scm.com/docs/git-credential.
+
+    Args:
+        username (`str`, defaults to `"hf_user"`):
+            A git username. Defaults to `"hf_user"`, the default user used in the Hub.
+        token (`str`, defaults to `"hf_user"`):
+            A git password. In practice, the User Access Token for the Hub.
+            See https://huggingface.co/settings/tokens.
+        folder (`str`, *optional*):
+            The folder in which to check the configured helpers.
+    """
+    with run_interactive_subprocess("git credential approve", folder=folder) as (
+        stdin,
+        _,
+    ):
+        stdin.write(f"url={ENDPOINT}\nusername={username.lower()}\npassword={token}\n\n")
+        stdin.flush()
+
+
+def unset_git_credential(username: str = "hf_user", folder: Optional[str] = None) -> None:
+    """Erase credentials from git credential for HF Hub registry.
+
+    Credentials are erased from the configured helpers (store, cache, macOS
+    keychain,...), if any. If `username` is not provided, any credential configured for
+    HF Hub endpoint is erased.
+    Calls "`git credential erase`" internally. See https://git-scm.com/docs/git-credential.
+
+    Args:
+        username (`str`, defaults to `"hf_user"`):
+            A git username. Defaults to `"hf_user"`, the default user used in the Hub.
+        folder (`str`, *optional*):
+            The folder in which to check the configured helpers.
+    """
+    with run_interactive_subprocess("git credential reject", folder=folder) as (
+        stdin,
+        _,
+    ):
+        standard_input = f"url={ENDPOINT}\n"
+        if username is not None:
+            standard_input += f"username={username.lower()}\n"
+        standard_input += "\n"
+
+        stdin.write(standard_input)
+        stdin.flush()
+
+
+def _parse_credential_output(output: str) -> List[str]:
+    """Parse the output of `git credential fill` to extract the password.
+
+    Args:
+        output (`str`):
+            The output of `git credential fill`.
+    """
+    # NOTE: If user has set an helper for a custom URL, it will not we caught here.
+    #       Example: `credential.https://huggingface.co.helper=store`
+    #       See: https://github.com/huggingface/huggingface_hub/pull/1138#discussion_r1013324508
+    return sorted(  # Sort for nice printing
+        set(  # Might have some duplicates
+            match[0] for match in GIT_CREDENTIAL_REGEX.findall(output)
+        )
+    )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_headers.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_headers.py
new file mode 100644
index 0000000000000000000000000000000000000000..053a92a398f8734ee14cd67e4b514dfc350fcecd
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_headers.py
@@ -0,0 +1,228 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains utilities to handle headers to send in calls to Huggingface Hub."""
+
+from typing import Dict, Optional, Union
+
+from huggingface_hub.errors import LocalTokenNotFoundError
+
+from .. import constants
+from ._auth import get_token
+from ._deprecation import _deprecate_arguments
+from ._runtime import (
+    get_fastai_version,
+    get_fastcore_version,
+    get_hf_hub_version,
+    get_python_version,
+    get_tf_version,
+    get_torch_version,
+    is_fastai_available,
+    is_fastcore_available,
+    is_tf_available,
+    is_torch_available,
+)
+from ._validators import validate_hf_hub_args
+
+
+@_deprecate_arguments(
+    version="1.0",
+    deprecated_args="is_write_action",
+    custom_message="This argument is ignored and we let the server handle the permission error instead (if any).",
+)
+@validate_hf_hub_args
+def build_hf_headers(
+    *,
+    token: Optional[Union[bool, str]] = None,
+    library_name: Optional[str] = None,
+    library_version: Optional[str] = None,
+    user_agent: Union[Dict, str, None] = None,
+    headers: Optional[Dict[str, str]] = None,
+    is_write_action: bool = False,
+) -> Dict[str, str]:
+    """
+    Build headers dictionary to send in a HF Hub call.
+
+    By default, authorization token is always provided either from argument (explicit
+    use) or retrieved from the cache (implicit use). To explicitly avoid sending the
+    token to the Hub, set `token=False` or set the `HF_HUB_DISABLE_IMPLICIT_TOKEN`
+    environment variable.
+
+    In case of an API call that requires write access, an error is thrown if token is
+    `None` or token is an organization token (starting with `"api_org***"`).
+
+    In addition to the auth header, a user-agent is added to provide information about
+    the installed packages (versions of python, huggingface_hub, torch, tensorflow,
+    fastai and fastcore).
+
+    Args:
+        token (`str`, `bool`, *optional*):
+            The token to be sent in authorization header for the Hub call:
+                - if a string, it is used as the Hugging Face token
+                - if `True`, the token is read from the machine (cache or env variable)
+                - if `False`, authorization header is not set
+                - if `None`, the token is read from the machine only except if
+                  `HF_HUB_DISABLE_IMPLICIT_TOKEN` env variable is set.
+        library_name (`str`, *optional*):
+            The name of the library that is making the HTTP request. Will be added to
+            the user-agent header.
+        library_version (`str`, *optional*):
+            The version of the library that is making the HTTP request. Will be added
+            to the user-agent header.
+        user_agent (`str`, `dict`, *optional*):
+            The user agent info in the form of a dictionary or a single string. It will
+            be completed with information about the installed packages.
+        headers (`dict`, *optional*):
+            Additional headers to include in the request. Those headers take precedence
+            over the ones generated by this function.
+        is_write_action (`bool`):
+            Ignored and deprecated argument.
+
+    Returns:
+        A `Dict` of headers to pass in your API call.
+
+    Example:
+    ```py
+        >>> build_hf_headers(token="hf_***") # explicit token
+        {"authorization": "Bearer hf_***", "user-agent": ""}
+
+        >>> build_hf_headers(token=True) # explicitly use cached token
+        {"authorization": "Bearer hf_***",...}
+
+        >>> build_hf_headers(token=False) # explicitly don't use cached token
+        {"user-agent": ...}
+
+        >>> build_hf_headers() # implicit use of the cached token
+        {"authorization": "Bearer hf_***",...}
+
+        # HF_HUB_DISABLE_IMPLICIT_TOKEN=True # to set as env variable
+        >>> build_hf_headers() # token is not sent
+        {"user-agent": ...}
+
+        >>> build_hf_headers(library_name="transformers", library_version="1.2.3")
+        {"authorization": ..., "user-agent": "transformers/1.2.3; hf_hub/0.10.2; python/3.10.4; tensorflow/1.55"}
+    ```
+
+    Raises:
+        [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
+            If organization token is passed and "write" access is required.
+        [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
+            If "write" access is required but token is not passed and not saved locally.
+        [`EnvironmentError`](https://docs.python.org/3/library/exceptions.html#EnvironmentError)
+            If `token=True` but token is not saved locally.
+    """
+    # Get auth token to send
+    token_to_send = get_token_to_send(token)
+
+    # Combine headers
+    hf_headers = {
+        "user-agent": _http_user_agent(
+            library_name=library_name,
+            library_version=library_version,
+            user_agent=user_agent,
+        )
+    }
+    if token_to_send is not None:
+        hf_headers["authorization"] = f"Bearer {token_to_send}"
+    if headers is not None:
+        hf_headers.update(headers)
+    return hf_headers
+
+
+def get_token_to_send(token: Optional[Union[bool, str]]) -> Optional[str]:
+    """Select the token to send from either `token` or the cache."""
+    # Case token is explicitly provided
+    if isinstance(token, str):
+        return token
+
+    # Case token is explicitly forbidden
+    if token is False:
+        return None
+
+    # Token is not provided: we get it from local cache
+    cached_token = get_token()
+
+    # Case token is explicitly required
+    if token is True:
+        if cached_token is None:
+            raise LocalTokenNotFoundError(
+                "Token is required (`token=True`), but no token found. You"
+                " need to provide a token or be logged in to Hugging Face with"
+                " `hf auth login` or `huggingface_hub.login`. See"
+                " https://huggingface.co/settings/tokens."
+            )
+        return cached_token
+
+    # Case implicit use of the token is forbidden by env variable
+    if constants.HF_HUB_DISABLE_IMPLICIT_TOKEN:
+        return None
+
+    # Otherwise: we use the cached token as the user has not explicitly forbidden it
+    return cached_token
+
+
+def _http_user_agent(
+    *,
+    library_name: Optional[str] = None,
+    library_version: Optional[str] = None,
+    user_agent: Union[Dict, str, None] = None,
+) -> str:
+    """Format a user-agent string containing information about the installed packages.
+
+    Args:
+        library_name (`str`, *optional*):
+            The name of the library that is making the HTTP request.
+        library_version (`str`, *optional*):
+            The version of the library that is making the HTTP request.
+        user_agent (`str`, `dict`, *optional*):
+            The user agent info in the form of a dictionary or a single string.
+
+    Returns:
+        The formatted user-agent string.
+    """
+    if library_name is not None:
+        ua = f"{library_name}/{library_version}"
+    else:
+        ua = "unknown/None"
+    ua += f"; hf_hub/{get_hf_hub_version()}"
+    ua += f"; python/{get_python_version()}"
+
+    if not constants.HF_HUB_DISABLE_TELEMETRY:
+        if is_torch_available():
+            ua += f"; torch/{get_torch_version()}"
+        if is_tf_available():
+            ua += f"; tensorflow/{get_tf_version()}"
+        if is_fastai_available():
+            ua += f"; fastai/{get_fastai_version()}"
+        if is_fastcore_available():
+            ua += f"; fastcore/{get_fastcore_version()}"
+
+    if isinstance(user_agent, dict):
+        ua += "; " + "; ".join(f"{k}/{v}" for k, v in user_agent.items())
+    elif isinstance(user_agent, str):
+        ua += "; " + user_agent
+
+    # Retrieve user-agent origin headers from environment variable
+    origin = constants.HF_HUB_USER_AGENT_ORIGIN
+    if origin is not None:
+        ua += "; origin/" + origin
+
+    return _deduplicate_user_agent(ua)
+
+
+def _deduplicate_user_agent(user_agent: str) -> str:
+    """Deduplicate redundant information in the generated user-agent."""
+    # Split around ";" > Strip whitespaces > Store as dict keys (ensure unicity) > format back as string
+    # Order is implicitly preserved by dictionary structure (see https://stackoverflow.com/a/53657523).
+    return "; ".join({key.strip(): None for key in user_agent.split(";")}.keys())
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_hf_folder.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_hf_folder.py
new file mode 100644
index 0000000000000000000000000000000000000000..6418bf2fd2c59b4bcf301c1dd82bc468f2f42ddf
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_hf_folder.py
@@ -0,0 +1,68 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contain helper class to retrieve/store token from/to local cache."""
+
+from pathlib import Path
+from typing import Optional
+
+from .. import constants
+from ._auth import get_token
+
+
+class HfFolder:
+    # TODO: deprecate when adapted in transformers/datasets/gradio
+    # @_deprecate_method(version="1.0", message="Use `huggingface_hub.login` instead.")
+    @classmethod
+    def save_token(cls, token: str) -> None:
+        """
+        Save token, creating folder as needed.
+
+        Token is saved in the huggingface home folder. You can configure it by setting
+        the `HF_HOME` environment variable.
+
+        Args:
+            token (`str`):
+                The token to save to the [`HfFolder`]
+        """
+        path_token = Path(constants.HF_TOKEN_PATH)
+        path_token.parent.mkdir(parents=True, exist_ok=True)
+        path_token.write_text(token)
+
+    # TODO: deprecate when adapted in transformers/datasets/gradio
+    # @_deprecate_method(version="1.0", message="Use `huggingface_hub.get_token` instead.")
+    @classmethod
+    def get_token(cls) -> Optional[str]:
+        """
+        Get token or None if not existent.
+
+        This method is deprecated in favor of [`huggingface_hub.get_token`] but is kept for backward compatibility.
+        Its behavior is the same as [`huggingface_hub.get_token`].
+
+        Returns:
+            `str` or `None`: The token, `None` if it doesn't exist.
+        """
+        return get_token()
+
+    # TODO: deprecate when adapted in transformers/datasets/gradio
+    # @_deprecate_method(version="1.0", message="Use `huggingface_hub.logout` instead.")
+    @classmethod
+    def delete_token(cls) -> None:
+        """
+        Deletes the token from storage. Does not fail if token does not exist.
+        """
+        try:
+            Path(constants.HF_TOKEN_PATH).unlink()
+        except FileNotFoundError:
+            pass
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_http.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_http.py
new file mode 100644
index 0000000000000000000000000000000000000000..3471031b34b15efe9d5fc76077eac467bbb03500
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_http.py
@@ -0,0 +1,638 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains utilities to handle HTTP requests in Huggingface Hub."""
+
+import io
+import os
+import re
+import threading
+import time
+import uuid
+from functools import lru_cache
+from shlex import quote
+from typing import Any, Callable, List, Optional, Tuple, Type, Union
+
+import requests
+from requests import HTTPError, Response
+from requests.adapters import HTTPAdapter
+from requests.models import PreparedRequest
+
+from huggingface_hub.errors import OfflineModeIsEnabled
+
+from .. import constants
+from ..errors import (
+    BadRequestError,
+    DisabledRepoError,
+    EntryNotFoundError,
+    GatedRepoError,
+    HfHubHTTPError,
+    RepositoryNotFoundError,
+    RevisionNotFoundError,
+)
+from . import logging
+from ._fixes import JSONDecodeError
+from ._lfs import SliceFileObj
+from ._typing import HTTP_METHOD_T
+
+
+logger = logging.get_logger(__name__)
+
+# Both headers are used by the Hub to debug failed requests.
+# `X_AMZN_TRACE_ID` is better as it also works to debug on Cloudfront and ALB.
+# If `X_AMZN_TRACE_ID` is set, the Hub will use it as well.
+X_AMZN_TRACE_ID = "X-Amzn-Trace-Id"
+X_REQUEST_ID = "x-request-id"
+X_AMZ_CF_ID = "x-amz-cf-id"
+
+REPO_API_REGEX = re.compile(
+    r"""
+        # staging or production endpoint
+        ^https://[^/]+
+        (
+            # on /api/repo_type/repo_id
+            /api/(models|datasets|spaces)/(.+)
+            |
+            # or /repo_id/resolve/revision/...
+            /(.+)/resolve/(.+)
+        )
+    """,
+    flags=re.VERBOSE,
+)
+
+
+class UniqueRequestIdAdapter(HTTPAdapter):
+    X_AMZN_TRACE_ID = "X-Amzn-Trace-Id"
+
+    def add_headers(self, request, **kwargs):
+        super().add_headers(request, **kwargs)
+
+        # Add random request ID => easier for server-side debug
+        if X_AMZN_TRACE_ID not in request.headers:
+            request.headers[X_AMZN_TRACE_ID] = request.headers.get(X_REQUEST_ID) or str(uuid.uuid4())
+
+        # Add debug log
+        has_token = len(str(request.headers.get("authorization", ""))) > 0
+        logger.debug(
+            f"Request {request.headers[X_AMZN_TRACE_ID]}: {request.method} {request.url} (authenticated: {has_token})"
+        )
+
+    def send(self, request: PreparedRequest, *args, **kwargs) -> Response:
+        """Catch any RequestException to append request id to the error message for debugging."""
+        if constants.HF_DEBUG:
+            logger.debug(f"Send: {_curlify(request)}")
+        try:
+            return super().send(request, *args, **kwargs)
+        except requests.RequestException as e:
+            request_id = request.headers.get(X_AMZN_TRACE_ID)
+            if request_id is not None:
+                # Taken from https://stackoverflow.com/a/58270258
+                e.args = (*e.args, f"(Request ID: {request_id})")
+            raise
+
+
+class OfflineAdapter(HTTPAdapter):
+    def send(self, request: PreparedRequest, *args, **kwargs) -> Response:
+        raise OfflineModeIsEnabled(
+            f"Cannot reach {request.url}: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable."
+        )
+
+
+def _default_backend_factory() -> requests.Session:
+    session = requests.Session()
+    if constants.HF_HUB_OFFLINE:
+        session.mount("http://", OfflineAdapter())
+        session.mount("https://", OfflineAdapter())
+    else:
+        session.mount("http://", UniqueRequestIdAdapter())
+        session.mount("https://", UniqueRequestIdAdapter())
+    return session
+
+
+BACKEND_FACTORY_T = Callable[[], requests.Session]
+_GLOBAL_BACKEND_FACTORY: BACKEND_FACTORY_T = _default_backend_factory
+
+
+def configure_http_backend(backend_factory: BACKEND_FACTORY_T = _default_backend_factory) -> None:
+    """
+    Configure the HTTP backend by providing a `backend_factory`. Any HTTP calls made by `huggingface_hub` will use a
+    Session object instantiated by this factory. This can be useful if you are running your scripts in a specific
+    environment requiring custom configuration (e.g. custom proxy or certifications).
+
+    Use [`get_session`] to get a configured Session. Since `requests.Session` is not guaranteed to be thread-safe,
+    `huggingface_hub` creates 1 Session instance per thread. They are all instantiated using the same `backend_factory`
+    set in [`configure_http_backend`]. A LRU cache is used to cache the created sessions (and connections) between
+    calls. Max size is 128 to avoid memory leaks if thousands of threads are spawned.
+
+    See [this issue](https://github.com/psf/requests/issues/2766) to know more about thread-safety in `requests`.
+
+    Example:
+    ```py
+    import requests
+    from huggingface_hub import configure_http_backend, get_session
+
+    # Create a factory function that returns a Session with configured proxies
+    def backend_factory() -> requests.Session:
+        session = requests.Session()
+        session.proxies = {"http": "http://10.10.1.10:3128", "https": "https://10.10.1.11:1080"}
+        return session
+
+    # Set it as the default session factory
+    configure_http_backend(backend_factory=backend_factory)
+
+    # In practice, this is mostly done internally in `huggingface_hub`
+    session = get_session()
+    ```
+    """
+    global _GLOBAL_BACKEND_FACTORY
+    _GLOBAL_BACKEND_FACTORY = backend_factory
+    reset_sessions()
+
+
+def get_session() -> requests.Session:
+    """
+    Get a `requests.Session` object, using the session factory from the user.
+
+    Use [`get_session`] to get a configured Session. Since `requests.Session` is not guaranteed to be thread-safe,
+    `huggingface_hub` creates 1 Session instance per thread. They are all instantiated using the same `backend_factory`
+    set in [`configure_http_backend`]. A LRU cache is used to cache the created sessions (and connections) between
+    calls. Max size is 128 to avoid memory leaks if thousands of threads are spawned.
+
+    See [this issue](https://github.com/psf/requests/issues/2766) to know more about thread-safety in `requests`.
+
+    Example:
+    ```py
+    import requests
+    from huggingface_hub import configure_http_backend, get_session
+
+    # Create a factory function that returns a Session with configured proxies
+    def backend_factory() -> requests.Session:
+        session = requests.Session()
+        session.proxies = {"http": "http://10.10.1.10:3128", "https": "https://10.10.1.11:1080"}
+        return session
+
+    # Set it as the default session factory
+    configure_http_backend(backend_factory=backend_factory)
+
+    # In practice, this is mostly done internally in `huggingface_hub`
+    session = get_session()
+    ```
+    """
+    return _get_session_from_cache(process_id=os.getpid(), thread_id=threading.get_ident())
+
+
+def reset_sessions() -> None:
+    """Reset the cache of sessions.
+
+    Mostly used internally when sessions are reconfigured or an SSLError is raised.
+    See [`configure_http_backend`] for more details.
+    """
+    _get_session_from_cache.cache_clear()
+
+
+@lru_cache
+def _get_session_from_cache(process_id: int, thread_id: int) -> requests.Session:
+    """
+    Create a new session per thread using global factory. Using LRU cache (maxsize 128) to avoid memory leaks when
+    using thousands of threads. Cache is cleared when `configure_http_backend` is called.
+    """
+    return _GLOBAL_BACKEND_FACTORY()
+
+
+def http_backoff(
+    method: HTTP_METHOD_T,
+    url: str,
+    *,
+    max_retries: int = 5,
+    base_wait_time: float = 1,
+    max_wait_time: float = 8,
+    retry_on_exceptions: Union[Type[Exception], Tuple[Type[Exception], ...]] = (
+        requests.Timeout,
+        requests.ConnectionError,
+        requests.exceptions.ChunkedEncodingError,
+    ),
+    retry_on_status_codes: Union[int, Tuple[int, ...]] = (500, 502, 503, 504),
+    **kwargs,
+) -> Response:
+    """Wrapper around requests to retry calls on an endpoint, with exponential backoff.
+
+    Endpoint call is retried on exceptions (ex: connection timeout, proxy error,...)
+    and/or on specific status codes (ex: service unavailable). If the call failed more
+    than `max_retries`, the exception is thrown or `raise_for_status` is called on the
+    response object.
+
+    Re-implement mechanisms from the `backoff` library to avoid adding an external
+    dependencies to `hugging_face_hub`. See https://github.com/litl/backoff.
+
+    Args:
+        method (`Literal["GET", "OPTIONS", "HEAD", "POST", "PUT", "PATCH", "DELETE"]`):
+            HTTP method to perform.
+        url (`str`):
+            The URL of the resource to fetch.
+        max_retries (`int`, *optional*, defaults to `5`):
+            Maximum number of retries, defaults to 5 (no retries).
+        base_wait_time (`float`, *optional*, defaults to `1`):
+            Duration (in seconds) to wait before retrying the first time.
+            Wait time between retries then grows exponentially, capped by
+            `max_wait_time`.
+        max_wait_time (`float`, *optional*, defaults to `8`):
+            Maximum duration (in seconds) to wait before retrying.
+        retry_on_exceptions (`Type[Exception]` or `Tuple[Type[Exception]]`, *optional*):
+            Define which exceptions must be caught to retry the request. Can be a single type or a tuple of types.
+            By default, retry on `requests.Timeout`, `requests.ConnectionError` and `requests.exceptions.ChunkedEncodingError`.
+        retry_on_status_codes (`int` or `Tuple[int]`, *optional*, defaults to `(500, 502, 503, 504)`):
+            Define on which status codes the request must be retried. By default, 5xx errors are retried.
+        **kwargs (`dict`, *optional*):
+            kwargs to pass to `requests.request`.
+
+    Example:
+    ```
+    >>> from huggingface_hub.utils import http_backoff
+
+    # Same usage as "requests.request".
+    >>> response = http_backoff("GET", "https://www.google.com")
+    >>> response.raise_for_status()
+
+    # If you expect a Gateway Timeout from time to time
+    >>> http_backoff("PUT", upload_url, data=data, retry_on_status_codes=504)
+    >>> response.raise_for_status()
+    ```
+
+    > [!WARNING]
+    > When using `requests` it is possible to stream data by passing an iterator to the
+    > `data` argument. On http backoff this is a problem as the iterator is not reset
+    > after a failed call. This issue is mitigated for file objects or any IO streams
+    > by saving the initial position of the cursor (with `data.tell()`) and resetting the
+    > cursor between each call (with `data.seek()`). For arbitrary iterators, http backoff
+    > will fail. If this is a hard constraint for you, please let us know by opening an
+    > issue on [Github](https://github.com/huggingface/huggingface_hub).
+    """
+    if isinstance(retry_on_exceptions, type):  # Tuple from single exception type
+        retry_on_exceptions = (retry_on_exceptions,)
+
+    if isinstance(retry_on_status_codes, int):  # Tuple from single status code
+        retry_on_status_codes = (retry_on_status_codes,)
+
+    nb_tries = 0
+    sleep_time = base_wait_time
+
+    # If `data` is used and is a file object (or any IO), it will be consumed on the
+    # first HTTP request. We need to save the initial position so that the full content
+    # of the file is re-sent on http backoff. See warning tip in docstring.
+    io_obj_initial_pos = None
+    if "data" in kwargs and isinstance(kwargs["data"], (io.IOBase, SliceFileObj)):
+        io_obj_initial_pos = kwargs["data"].tell()
+
+    session = get_session()
+    while True:
+        nb_tries += 1
+        try:
+            # If `data` is used and is a file object (or any IO), set back cursor to
+            # initial position.
+            if io_obj_initial_pos is not None:
+                kwargs["data"].seek(io_obj_initial_pos)
+
+            # Perform request and return if status_code is not in the retry list.
+            response = session.request(method=method, url=url, **kwargs)
+            if response.status_code not in retry_on_status_codes:
+                return response
+
+            # Wrong status code returned (HTTP 503 for instance)
+            logger.warning(f"HTTP Error {response.status_code} thrown while requesting {method} {url}")
+            if nb_tries > max_retries:
+                response.raise_for_status()  # Will raise uncaught exception
+                # We return response to avoid infinite loop in the corner case where the
+                # user ask for retry on a status code that doesn't raise_for_status.
+                return response
+
+        except retry_on_exceptions as err:
+            logger.warning(f"'{err}' thrown while requesting {method} {url}")
+
+            if isinstance(err, requests.ConnectionError):
+                reset_sessions()  # In case of SSLError it's best to reset the shared requests.Session objects
+
+            if nb_tries > max_retries:
+                raise err
+
+        # Sleep for X seconds
+        logger.warning(f"Retrying in {sleep_time}s [Retry {nb_tries}/{max_retries}].")
+        time.sleep(sleep_time)
+
+        # Update sleep time for next retry
+        sleep_time = min(max_wait_time, sleep_time * 2)  # Exponential backoff
+
+
+def fix_hf_endpoint_in_url(url: str, endpoint: Optional[str]) -> str:
+    """Replace the default endpoint in a URL by a custom one.
+
+    This is useful when using a proxy and the Hugging Face Hub returns a URL with the default endpoint.
+    """
+    endpoint = endpoint.rstrip("/") if endpoint else constants.ENDPOINT
+    # check if a proxy has been set => if yes, update the returned URL to use the proxy
+    if endpoint not in (constants._HF_DEFAULT_ENDPOINT, constants._HF_DEFAULT_STAGING_ENDPOINT):
+        url = url.replace(constants._HF_DEFAULT_ENDPOINT, endpoint)
+        url = url.replace(constants._HF_DEFAULT_STAGING_ENDPOINT, endpoint)
+    return url
+
+
+def hf_raise_for_status(response: Response, endpoint_name: Optional[str] = None) -> None:
+    """
+    Internal version of `response.raise_for_status()` that will refine a
+    potential HTTPError. Raised exception will be an instance of `HfHubHTTPError`.
+
+    This helper is meant to be the unique method to raise_for_status when making a call
+    to the Hugging Face Hub.
+
+
+    Example:
+    ```py
+        import requests
+        from huggingface_hub.utils import get_session, hf_raise_for_status, HfHubHTTPError
+
+        response = get_session().post(...)
+        try:
+            hf_raise_for_status(response)
+        except HfHubHTTPError as e:
+            print(str(e)) # formatted message
+            e.request_id, e.server_message # details returned by server
+
+            # Complete the error message with additional information once it's raised
+            e.append_to_message("\n`create_commit` expects the repository to exist.")
+            raise
+    ```
+
+    Args:
+        response (`Response`):
+            Response from the server.
+        endpoint_name (`str`, *optional*):
+            Name of the endpoint that has been called. If provided, the error message
+            will be more complete.
+
+    > [!WARNING]
+    > Raises when the request has failed:
+    >
+    >     - [`~utils.RepositoryNotFoundError`]
+    >         If the repository to download from cannot be found. This may be because it
+    >         doesn't exist, because `repo_type` is not set correctly, or because the repo
+    >         is `private` and you do not have access.
+    >     - [`~utils.GatedRepoError`]
+    >         If the repository exists but is gated and the user is not on the authorized
+    >         list.
+    >     - [`~utils.RevisionNotFoundError`]
+    >         If the repository exists but the revision couldn't be find.
+    >     - [`~utils.EntryNotFoundError`]
+    >         If the repository exists but the entry (e.g. the requested file) couldn't be
+    >         find.
+    >     - [`~utils.BadRequestError`]
+    >         If request failed with a HTTP 400 BadRequest error.
+    >     - [`~utils.HfHubHTTPError`]
+    >         If request failed for a reason not listed above.
+    """
+    try:
+        response.raise_for_status()
+    except HTTPError as e:
+        error_code = response.headers.get("X-Error-Code")
+        error_message = response.headers.get("X-Error-Message")
+
+        if error_code == "RevisionNotFound":
+            message = f"{response.status_code} Client Error." + "\n\n" + f"Revision Not Found for url: {response.url}."
+            raise _format(RevisionNotFoundError, message, response) from e
+
+        elif error_code == "EntryNotFound":
+            message = f"{response.status_code} Client Error." + "\n\n" + f"Entry Not Found for url: {response.url}."
+            raise _format(EntryNotFoundError, message, response) from e
+
+        elif error_code == "GatedRepo":
+            message = (
+                f"{response.status_code} Client Error." + "\n\n" + f"Cannot access gated repo for url {response.url}."
+            )
+            raise _format(GatedRepoError, message, response) from e
+
+        elif error_message == "Access to this resource is disabled.":
+            message = (
+                f"{response.status_code} Client Error."
+                + "\n\n"
+                + f"Cannot access repository for url {response.url}."
+                + "\n"
+                + "Access to this resource is disabled."
+            )
+            raise _format(DisabledRepoError, message, response) from e
+
+        elif error_code == "RepoNotFound" or (
+            response.status_code == 401
+            and error_message != "Invalid credentials in Authorization header"
+            and response.request is not None
+            and response.request.url is not None
+            and REPO_API_REGEX.search(response.request.url) is not None
+        ):
+            # 401 is misleading as it is returned for:
+            #    - private and gated repos if user is not authenticated
+            #    - missing repos
+            # => for now, we process them as `RepoNotFound` anyway.
+            # See https://gist.github.com/Wauplin/46c27ad266b15998ce56a6603796f0b9
+            message = (
+                f"{response.status_code} Client Error."
+                + "\n\n"
+                + f"Repository Not Found for url: {response.url}."
+                + "\nPlease make sure you specified the correct `repo_id` and"
+                " `repo_type`.\nIf you are trying to access a private or gated repo,"
+                " make sure you are authenticated. For more details, see"
+                " https://huggingface.co/docs/huggingface_hub/authentication"
+            )
+            raise _format(RepositoryNotFoundError, message, response) from e
+
+        elif response.status_code == 400:
+            message = (
+                f"\n\nBad request for {endpoint_name} endpoint:" if endpoint_name is not None else "\n\nBad request:"
+            )
+            raise _format(BadRequestError, message, response) from e
+
+        elif response.status_code == 403:
+            message = (
+                f"\n\n{response.status_code} Forbidden: {error_message}."
+                + f"\nCannot access content at: {response.url}."
+                + "\nMake sure your token has the correct permissions."
+            )
+            raise _format(HfHubHTTPError, message, response) from e
+
+        elif response.status_code == 416:
+            range_header = response.request.headers.get("Range")
+            message = f"{e}. Requested range: {range_header}. Content-Range: {response.headers.get('Content-Range')}."
+            raise _format(HfHubHTTPError, message, response) from e
+
+        # Convert `HTTPError` into a `HfHubHTTPError` to display request information
+        # as well (request id and/or server error message)
+        raise _format(HfHubHTTPError, str(e), response) from e
+
+
+def _format(error_type: Type[HfHubHTTPError], custom_message: str, response: Response) -> HfHubHTTPError:
+    server_errors = []
+
+    # Retrieve server error from header
+    from_headers = response.headers.get("X-Error-Message")
+    if from_headers is not None:
+        server_errors.append(from_headers)
+
+    # Retrieve server error from body
+    try:
+        # Case errors are returned in a JSON format
+        data = response.json()
+
+        error = data.get("error")
+        if error is not None:
+            if isinstance(error, list):
+                # Case {'error': ['my error 1', 'my error 2']}
+                server_errors.extend(error)
+            else:
+                # Case {'error': 'my error'}
+                server_errors.append(error)
+
+        errors = data.get("errors")
+        if errors is not None:
+            # Case {'errors': [{'message': 'my error 1'}, {'message': 'my error 2'}]}
+            for error in errors:
+                if "message" in error:
+                    server_errors.append(error["message"])
+
+    except JSONDecodeError:
+        # If content is not JSON and not HTML, append the text
+        content_type = response.headers.get("Content-Type", "")
+        if response.text and "html" not in content_type.lower():
+            server_errors.append(response.text)
+
+    # Strip all server messages
+    server_errors = [str(line).strip() for line in server_errors if str(line).strip()]
+
+    # Deduplicate server messages (keep order)
+    # taken from https://stackoverflow.com/a/17016257
+    server_errors = list(dict.fromkeys(server_errors))
+
+    # Format server error
+    server_message = "\n".join(server_errors)
+
+    # Add server error to custom message
+    final_error_message = custom_message
+    if server_message and server_message.lower() not in custom_message.lower():
+        if "\n\n" in custom_message:
+            final_error_message += "\n" + server_message
+        else:
+            final_error_message += "\n\n" + server_message
+
+    # Prepare Request ID message
+    request_id = ""
+    request_id_message = ""
+    for header, label in (
+        (X_REQUEST_ID, "Request ID"),
+        (X_AMZN_TRACE_ID, "Amzn Trace ID"),
+        (X_AMZ_CF_ID, "Amz CF ID"),
+    ):
+        value = response.headers.get(header)
+        if value:
+            request_id = str(value)
+            request_id_message = f" ({label}: {value})"
+            break
+
+    # Add Request ID
+    if request_id and request_id.lower() not in final_error_message.lower():
+        if "\n" in final_error_message:
+            newline_index = final_error_message.index("\n")
+            final_error_message = (
+                final_error_message[:newline_index] + request_id_message + final_error_message[newline_index:]
+            )
+        else:
+            final_error_message += request_id_message
+
+    # Return
+    return error_type(final_error_message.strip(), response=response, server_message=server_message or None)
+
+
+def _curlify(request: requests.PreparedRequest) -> str:
+    """Convert a `requests.PreparedRequest` into a curl command (str).
+
+    Used for debug purposes only.
+
+    Implementation vendored from https://github.com/ofw/curlify/blob/master/curlify.py.
+    MIT License Copyright (c) 2016 Egor.
+    """
+    parts: List[Tuple[Any, Any]] = [
+        ("curl", None),
+        ("-X", request.method),
+    ]
+
+    for k, v in sorted(request.headers.items()):
+        if k.lower() == "authorization":
+            v = "<TOKEN>"  # Hide authorization header, no matter its value (can be Bearer, Key, etc.)
+        parts += [("-H", "{0}: {1}".format(k, v))]
+
+    if request.body:
+        body = request.body
+        if isinstance(body, bytes):
+            body = body.decode("utf-8", errors="ignore")
+        elif hasattr(body, "read"):
+            body = "<file-like object>"  # Don't try to read it to avoid consuming the stream
+        if len(body) > 1000:
+            body = body[:1000] + " ... [truncated]"
+        parts += [("-d", body.replace("\n", ""))]
+
+    parts += [(None, request.url)]
+
+    flat_parts = []
+    for k, v in parts:
+        if k:
+            flat_parts.append(quote(k))
+        if v:
+            flat_parts.append(quote(v))
+
+    return " ".join(flat_parts)
+
+
+# Regex to parse HTTP Range header
+RANGE_REGEX = re.compile(r"^\s*bytes\s*=\s*(\d*)\s*-\s*(\d*)\s*$", re.IGNORECASE)
+
+
+def _adjust_range_header(original_range: Optional[str], resume_size: int) -> Optional[str]:
+    """
+    Adjust HTTP Range header to account for resume position.
+    """
+    if not original_range:
+        return f"bytes={resume_size}-"
+
+    if "," in original_range:
+        raise ValueError(f"Multiple ranges detected - {original_range!r}, not supported yet.")
+
+    match = RANGE_REGEX.match(original_range)
+    if not match:
+        raise RuntimeError(f"Invalid range format - {original_range!r}.")
+    start, end = match.groups()
+
+    if not start:
+        if not end:
+            raise RuntimeError(f"Invalid range format - {original_range!r}.")
+
+        new_suffix = int(end) - resume_size
+        new_range = f"bytes=-{new_suffix}"
+        if new_suffix <= 0:
+            raise RuntimeError(f"Empty new range - {new_range!r}.")
+        return new_range
+
+    start = int(start)
+    new_start = start + resume_size
+    if end:
+        end = int(end)
+        new_range = f"bytes={new_start}-{end}"
+        if new_start > end:
+            raise RuntimeError(f"Empty new range - {new_range!r}.")
+        return new_range
+
+    return f"bytes={new_start}-"
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_lfs.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_lfs.py
new file mode 100644
index 0000000000000000000000000000000000000000..307f371ffa79a8ae726ee03458c52e230a792898
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_lfs.py
@@ -0,0 +1,110 @@
+# coding=utf-8
+# Copyright 2019-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Git LFS related utilities"""
+
+import io
+import os
+from contextlib import AbstractContextManager
+from typing import BinaryIO
+
+
+class SliceFileObj(AbstractContextManager):
+    """
+    Utility context manager to read a *slice* of a seekable file-like object as a seekable, file-like object.
+
+    This is NOT thread safe
+
+    Inspired by stackoverflow.com/a/29838711/593036
+
+    Credits to @julien-c
+
+    Args:
+        fileobj (`BinaryIO`):
+            A file-like object to slice. MUST implement `tell()` and `seek()` (and `read()` of course).
+            `fileobj` will be reset to its original position when exiting the context manager.
+        seek_from (`int`):
+            The start of the slice (offset from position 0 in bytes).
+        read_limit (`int`):
+            The maximum number of bytes to read from the slice.
+
+    Attributes:
+        previous_position (`int`):
+            The previous position
+
+    Examples:
+
+    Reading 200 bytes with an offset of 128 bytes from a file (ie bytes 128 to 327):
+    ```python
+    >>> with open("path/to/file", "rb") as file:
+    ...     with SliceFileObj(file, seek_from=128, read_limit=200) as fslice:
+    ...         fslice.read(...)
+    ```
+
+    Reading a file in chunks of 512 bytes
+    ```python
+    >>> import os
+    >>> chunk_size = 512
+    >>> file_size = os.getsize("path/to/file")
+    >>> with open("path/to/file", "rb") as file:
+    ...     for chunk_idx in range(ceil(file_size / chunk_size)):
+    ...         with SliceFileObj(file, seek_from=chunk_idx * chunk_size, read_limit=chunk_size) as fslice:
+    ...             chunk = fslice.read(...)
+
+    ```
+    """
+
+    def __init__(self, fileobj: BinaryIO, seek_from: int, read_limit: int):
+        self.fileobj = fileobj
+        self.seek_from = seek_from
+        self.read_limit = read_limit
+
+    def __enter__(self):
+        self._previous_position = self.fileobj.tell()
+        end_of_stream = self.fileobj.seek(0, os.SEEK_END)
+        self._len = min(self.read_limit, end_of_stream - self.seek_from)
+        # ^^ The actual number of bytes that can be read from the slice
+        self.fileobj.seek(self.seek_from, io.SEEK_SET)
+        return self
+
+    def __exit__(self, exc_type, exc_value, traceback):
+        self.fileobj.seek(self._previous_position, io.SEEK_SET)
+
+    def read(self, n: int = -1):
+        pos = self.tell()
+        if pos >= self._len:
+            return b""
+        remaining_amount = self._len - pos
+        data = self.fileobj.read(remaining_amount if n < 0 else min(n, remaining_amount))
+        return data
+
+    def tell(self) -> int:
+        return self.fileobj.tell() - self.seek_from
+
+    def seek(self, offset: int, whence: int = os.SEEK_SET) -> int:
+        start = self.seek_from
+        end = start + self._len
+        if whence in (os.SEEK_SET, os.SEEK_END):
+            offset = start + offset if whence == os.SEEK_SET else end + offset
+            offset = max(start, min(offset, end))
+            whence = os.SEEK_SET
+        elif whence == os.SEEK_CUR:
+            cur_pos = self.fileobj.tell()
+            offset = max(start - cur_pos, min(offset, end - cur_pos))
+        else:
+            raise ValueError(f"whence value {whence} is not supported")
+        return self.fileobj.seek(offset, whence) - self.seek_from
+
+    def __iter__(self):
+        yield self.read(n=4 * 1024 * 1024)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_pagination.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_pagination.py
new file mode 100644
index 0000000000000000000000000000000000000000..3ef2b6668ba09d4c6a715509131d157139a1fac0
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_pagination.py
@@ -0,0 +1,52 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains utilities to handle pagination on Huggingface Hub."""
+
+from typing import Dict, Iterable, Optional
+
+import requests
+
+from . import get_session, hf_raise_for_status, http_backoff, logging
+
+
+logger = logging.get_logger(__name__)
+
+
+def paginate(path: str, params: Dict, headers: Dict) -> Iterable:
+    """Fetch a list of models/datasets/spaces and paginate through results.
+
+    This is using the same "Link" header format as GitHub.
+    See:
+    - https://requests.readthedocs.io/en/latest/api/#requests.Response.links
+    - https://docs.github.com/en/rest/guides/traversing-with-pagination#link-header
+    """
+    session = get_session()
+    r = session.get(path, params=params, headers=headers)
+    hf_raise_for_status(r)
+    yield from r.json()
+
+    # Follow pages
+    # Next link already contains query params
+    next_page = _get_next_page(r)
+    while next_page is not None:
+        logger.debug(f"Pagination detected. Requesting next page: {next_page}")
+        r = http_backoff("GET", next_page, max_retries=20, retry_on_status_codes=429, headers=headers)
+        hf_raise_for_status(r)
+        yield from r.json()
+        next_page = _get_next_page(r)
+
+
+def _get_next_page(response: requests.Response) -> Optional[str]:
+    return response.links.get("next", {}).get("url")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_paths.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_paths.py
new file mode 100644
index 0000000000000000000000000000000000000000..4f2c0ebce070bbde4900e919a3aca7cfc331e747
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_paths.py
@@ -0,0 +1,141 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains utilities to handle paths in Huggingface Hub."""
+
+from fnmatch import fnmatch
+from pathlib import Path
+from typing import Callable, Generator, Iterable, List, Optional, TypeVar, Union
+
+
+T = TypeVar("T")
+
+# Always ignore `.git` and `.cache/huggingface` folders in commits
+DEFAULT_IGNORE_PATTERNS = [
+    ".git",
+    ".git/*",
+    "*/.git",
+    "**/.git/**",
+    ".cache/huggingface",
+    ".cache/huggingface/*",
+    "*/.cache/huggingface",
+    "**/.cache/huggingface/**",
+]
+# Forbidden to commit these folders
+FORBIDDEN_FOLDERS = [".git", ".cache"]
+
+
+def filter_repo_objects(
+    items: Iterable[T],
+    *,
+    allow_patterns: Optional[Union[List[str], str]] = None,
+    ignore_patterns: Optional[Union[List[str], str]] = None,
+    key: Optional[Callable[[T], str]] = None,
+) -> Generator[T, None, None]:
+    """Filter repo objects based on an allowlist and a denylist.
+
+    Input must be a list of paths (`str` or `Path`) or a list of arbitrary objects.
+    In the later case, `key` must be provided and specifies a function of one argument
+    that is used to extract a path from each element in iterable.
+
+    Patterns are Unix shell-style wildcards which are NOT regular expressions. See
+    https://docs.python.org/3/library/fnmatch.html for more details.
+
+    Args:
+        items (`Iterable`):
+            List of items to filter.
+        allow_patterns (`str` or `List[str]`, *optional*):
+            Patterns constituting the allowlist. If provided, item paths must match at
+            least one pattern from the allowlist.
+        ignore_patterns (`str` or `List[str]`, *optional*):
+            Patterns constituting the denylist. If provided, item paths must not match
+            any patterns from the denylist.
+        key (`Callable[[T], str]`, *optional*):
+            Single-argument function to extract a path from each item. If not provided,
+            the `items` must already be `str` or `Path`.
+
+    Returns:
+        Filtered list of objects, as a generator.
+
+    Raises:
+        :class:`ValueError`:
+            If `key` is not provided and items are not `str` or `Path`.
+
+    Example usage with paths:
+    ```python
+    >>> # Filter only PDFs that are not hidden.
+    >>> list(filter_repo_objects(
+    ...     ["aaa.PDF", "bbb.jpg", ".ccc.pdf", ".ddd.png"],
+    ...     allow_patterns=["*.pdf"],
+    ...     ignore_patterns=[".*"],
+    ... ))
+    ["aaa.pdf"]
+    ```
+
+    Example usage with objects:
+    ```python
+    >>> list(filter_repo_objects(
+    ... [
+    ...     CommitOperationAdd(path_or_fileobj="/tmp/aaa.pdf", path_in_repo="aaa.pdf")
+    ...     CommitOperationAdd(path_or_fileobj="/tmp/bbb.jpg", path_in_repo="bbb.jpg")
+    ...     CommitOperationAdd(path_or_fileobj="/tmp/.ccc.pdf", path_in_repo=".ccc.pdf")
+    ...     CommitOperationAdd(path_or_fileobj="/tmp/.ddd.png", path_in_repo=".ddd.png")
+    ... ],
+    ... allow_patterns=["*.pdf"],
+    ... ignore_patterns=[".*"],
+    ... key=lambda x: x.repo_in_path
+    ... ))
+    [CommitOperationAdd(path_or_fileobj="/tmp/aaa.pdf", path_in_repo="aaa.pdf")]
+    ```
+    """
+    if isinstance(allow_patterns, str):
+        allow_patterns = [allow_patterns]
+
+    if isinstance(ignore_patterns, str):
+        ignore_patterns = [ignore_patterns]
+
+    if allow_patterns is not None:
+        allow_patterns = [_add_wildcard_to_directories(p) for p in allow_patterns]
+    if ignore_patterns is not None:
+        ignore_patterns = [_add_wildcard_to_directories(p) for p in ignore_patterns]
+
+    if key is None:
+
+        def _identity(item: T) -> str:
+            if isinstance(item, str):
+                return item
+            if isinstance(item, Path):
+                return str(item)
+            raise ValueError(f"Please provide `key` argument in `filter_repo_objects`: `{item}` is not a string.")
+
+        key = _identity  # Items must be `str` or `Path`, otherwise raise ValueError
+
+    for item in items:
+        path = key(item)
+
+        # Skip if there's an allowlist and path doesn't match any
+        if allow_patterns is not None and not any(fnmatch(path, r) for r in allow_patterns):
+            continue
+
+        # Skip if there's a denylist and path matches any
+        if ignore_patterns is not None and any(fnmatch(path, r) for r in ignore_patterns):
+            continue
+
+        yield item
+
+
+def _add_wildcard_to_directories(pattern: str) -> str:
+    if pattern[-1] == "/":
+        return pattern + "*"
+    return pattern
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_runtime.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_runtime.py
new file mode 100644
index 0000000000000000000000000000000000000000..9e38e6da7493074703032150b8b7d6766ed0fed6
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_runtime.py
@@ -0,0 +1,395 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Check presence of installed packages at runtime."""
+
+import importlib.metadata
+import os
+import platform
+import sys
+import warnings
+from typing import Any, Dict
+
+from .. import __version__, constants
+
+
+_PY_VERSION: str = sys.version.split()[0].rstrip("+")
+
+_package_versions = {}
+
+_CANDIDATES = {
+    "aiohttp": {"aiohttp"},
+    "fastai": {"fastai"},
+    "fastapi": {"fastapi"},
+    "fastcore": {"fastcore"},
+    "gradio": {"gradio"},
+    "graphviz": {"graphviz"},
+    "hf_transfer": {"hf_transfer"},
+    "hf_xet": {"hf_xet"},
+    "jinja": {"Jinja2"},
+    "keras": {"keras"},
+    "numpy": {"numpy"},
+    "pillow": {"Pillow"},
+    "pydantic": {"pydantic"},
+    "pydot": {"pydot"},
+    "safetensors": {"safetensors"},
+    "tensorboard": {"tensorboardX"},
+    "tensorflow": (
+        "tensorflow",
+        "tensorflow-cpu",
+        "tensorflow-gpu",
+        "tf-nightly",
+        "tf-nightly-cpu",
+        "tf-nightly-gpu",
+        "intel-tensorflow",
+        "intel-tensorflow-avx512",
+        "tensorflow-rocm",
+        "tensorflow-macos",
+    ),
+    "torch": {"torch"},
+}
+
+# Check once at runtime
+for candidate_name, package_names in _CANDIDATES.items():
+    _package_versions[candidate_name] = "N/A"
+    for name in package_names:
+        try:
+            _package_versions[candidate_name] = importlib.metadata.version(name)
+            break
+        except importlib.metadata.PackageNotFoundError:
+            pass
+
+
+def _get_version(package_name: str) -> str:
+    return _package_versions.get(package_name, "N/A")
+
+
+def is_package_available(package_name: str) -> bool:
+    return _get_version(package_name) != "N/A"
+
+
+# Python
+def get_python_version() -> str:
+    return _PY_VERSION
+
+
+# Huggingface Hub
+def get_hf_hub_version() -> str:
+    return __version__
+
+
+# aiohttp
+def is_aiohttp_available() -> bool:
+    return is_package_available("aiohttp")
+
+
+def get_aiohttp_version() -> str:
+    return _get_version("aiohttp")
+
+
+# FastAI
+def is_fastai_available() -> bool:
+    return is_package_available("fastai")
+
+
+def get_fastai_version() -> str:
+    return _get_version("fastai")
+
+
+# FastAPI
+def is_fastapi_available() -> bool:
+    return is_package_available("fastapi")
+
+
+def get_fastapi_version() -> str:
+    return _get_version("fastapi")
+
+
+# Fastcore
+def is_fastcore_available() -> bool:
+    return is_package_available("fastcore")
+
+
+def get_fastcore_version() -> str:
+    return _get_version("fastcore")
+
+
+# FastAI
+def is_gradio_available() -> bool:
+    return is_package_available("gradio")
+
+
+def get_gradio_version() -> str:
+    return _get_version("gradio")
+
+
+# Graphviz
+def is_graphviz_available() -> bool:
+    return is_package_available("graphviz")
+
+
+def get_graphviz_version() -> str:
+    return _get_version("graphviz")
+
+
+# hf_transfer
+def is_hf_transfer_available() -> bool:
+    return is_package_available("hf_transfer")
+
+
+def get_hf_transfer_version() -> str:
+    return _get_version("hf_transfer")
+
+
+# xet
+def is_xet_available() -> bool:
+    # since hf_xet is automatically used if available, allow explicit disabling via environment variable
+    if constants.HF_HUB_DISABLE_XET:
+        return False
+
+    return is_package_available("hf_xet")
+
+
+def get_xet_version() -> str:
+    return _get_version("hf_xet")
+
+
+# keras
+def is_keras_available() -> bool:
+    return is_package_available("keras")
+
+
+def get_keras_version() -> str:
+    return _get_version("keras")
+
+
+# Numpy
+def is_numpy_available() -> bool:
+    return is_package_available("numpy")
+
+
+def get_numpy_version() -> str:
+    return _get_version("numpy")
+
+
+# Jinja
+def is_jinja_available() -> bool:
+    return is_package_available("jinja")
+
+
+def get_jinja_version() -> str:
+    return _get_version("jinja")
+
+
+# Pillow
+def is_pillow_available() -> bool:
+    return is_package_available("pillow")
+
+
+def get_pillow_version() -> str:
+    return _get_version("pillow")
+
+
+# Pydantic
+def is_pydantic_available() -> bool:
+    if not is_package_available("pydantic"):
+        return False
+    # For Pydantic, we add an extra check to test whether it is correctly installed or not. If both pydantic 2.x and
+    # typing_extensions<=4.5.0 are installed, then pydantic will fail at import time. This should not happen when
+    # it is installed with `pip install huggingface_hub[inference]` but it can happen when it is installed manually
+    # by the user in an environment that we don't control.
+    #
+    # Usually we won't need to do this kind of check on optional dependencies. However, pydantic is a special case
+    # as it is automatically imported when doing `from huggingface_hub import ...` even if the user doesn't use it.
+    #
+    # See https://github.com/huggingface/huggingface_hub/pull/1829 for more details.
+    try:
+        from pydantic import validator  # noqa: F401
+    except ImportError:
+        # Example: "ImportError: cannot import name 'TypeAliasType' from 'typing_extensions'"
+        warnings.warn(
+            "Pydantic is installed but cannot be imported. Please check your installation. `huggingface_hub` will "
+            "default to not using Pydantic. Error message: '{e}'"
+        )
+        return False
+    return True
+
+
+def get_pydantic_version() -> str:
+    return _get_version("pydantic")
+
+
+# Pydot
+def is_pydot_available() -> bool:
+    return is_package_available("pydot")
+
+
+def get_pydot_version() -> str:
+    return _get_version("pydot")
+
+
+# Tensorboard
+def is_tensorboard_available() -> bool:
+    return is_package_available("tensorboard")
+
+
+def get_tensorboard_version() -> str:
+    return _get_version("tensorboard")
+
+
+# Tensorflow
+def is_tf_available() -> bool:
+    return is_package_available("tensorflow")
+
+
+def get_tf_version() -> str:
+    return _get_version("tensorflow")
+
+
+# Torch
+def is_torch_available() -> bool:
+    return is_package_available("torch")
+
+
+def get_torch_version() -> str:
+    return _get_version("torch")
+
+
+# Safetensors
+def is_safetensors_available() -> bool:
+    return is_package_available("safetensors")
+
+
+# Shell-related helpers
+try:
+    # Set to `True` if script is running in a Google Colab notebook.
+    # If running in Google Colab, git credential store is set globally which makes the
+    # warning disappear. See https://github.com/huggingface/huggingface_hub/issues/1043
+    #
+    # Taken from https://stackoverflow.com/a/63519730.
+    _is_google_colab = "google.colab" in str(get_ipython())  # type: ignore # noqa: F821
+except NameError:
+    _is_google_colab = False
+
+
+def is_notebook() -> bool:
+    """Return `True` if code is executed in a notebook (Jupyter, Colab, QTconsole).
+
+    Taken from https://stackoverflow.com/a/39662359.
+    Adapted to make it work with Google colab as well.
+    """
+    try:
+        shell_class = get_ipython().__class__  # type: ignore # noqa: F821
+        for parent_class in shell_class.__mro__:  # e.g. "is subclass of"
+            if parent_class.__name__ == "ZMQInteractiveShell":
+                return True  # Jupyter notebook, Google colab or qtconsole
+        return False
+    except NameError:
+        return False  # Probably standard Python interpreter
+
+
+def is_google_colab() -> bool:
+    """Return `True` if code is executed in a Google colab.
+
+    Taken from https://stackoverflow.com/a/63519730.
+    """
+    return _is_google_colab
+
+
+def is_colab_enterprise() -> bool:
+    """Return `True` if code is executed in a Google Colab Enterprise environment."""
+    return os.environ.get("VERTEX_PRODUCT") == "COLAB_ENTERPRISE"
+
+
+def dump_environment_info() -> Dict[str, Any]:
+    """Dump information about the machine to help debugging issues.
+
+    Similar helper exist in:
+    - `datasets` (https://github.com/huggingface/datasets/blob/main/src/datasets/commands/env.py)
+    - `diffusers` (https://github.com/huggingface/diffusers/blob/main/src/diffusers/commands/env.py)
+    - `transformers` (https://github.com/huggingface/transformers/blob/main/src/transformers/commands/env.py)
+    """
+    from huggingface_hub import get_token, whoami
+    from huggingface_hub.utils import list_credential_helpers
+
+    token = get_token()
+
+    # Generic machine info
+    info: Dict[str, Any] = {
+        "huggingface_hub version": get_hf_hub_version(),
+        "Platform": platform.platform(),
+        "Python version": get_python_version(),
+    }
+
+    # Interpreter info
+    try:
+        shell_class = get_ipython().__class__  # type: ignore # noqa: F821
+        info["Running in iPython ?"] = "Yes"
+        info["iPython shell"] = shell_class.__name__
+    except NameError:
+        info["Running in iPython ?"] = "No"
+    info["Running in notebook ?"] = "Yes" if is_notebook() else "No"
+    info["Running in Google Colab ?"] = "Yes" if is_google_colab() else "No"
+    info["Running in Google Colab Enterprise ?"] = "Yes" if is_colab_enterprise() else "No"
+    # Login info
+    info["Token path ?"] = constants.HF_TOKEN_PATH
+    info["Has saved token ?"] = token is not None
+    if token is not None:
+        try:
+            info["Who am I ?"] = whoami()["name"]
+        except Exception:
+            pass
+
+    try:
+        info["Configured git credential helpers"] = ", ".join(list_credential_helpers())
+    except Exception:
+        pass
+
+    # Installed dependencies
+    info["FastAI"] = get_fastai_version()
+    info["Tensorflow"] = get_tf_version()
+    info["Torch"] = get_torch_version()
+    info["Jinja2"] = get_jinja_version()
+    info["Graphviz"] = get_graphviz_version()
+    info["keras"] = get_keras_version()
+    info["Pydot"] = get_pydot_version()
+    info["Pillow"] = get_pillow_version()
+    info["hf_transfer"] = get_hf_transfer_version()
+    info["gradio"] = get_gradio_version()
+    info["tensorboard"] = get_tensorboard_version()
+    info["numpy"] = get_numpy_version()
+    info["pydantic"] = get_pydantic_version()
+    info["aiohttp"] = get_aiohttp_version()
+    info["hf_xet"] = get_xet_version()
+
+    # Environment variables
+    info["ENDPOINT"] = constants.ENDPOINT
+    info["HF_HUB_CACHE"] = constants.HF_HUB_CACHE
+    info["HF_ASSETS_CACHE"] = constants.HF_ASSETS_CACHE
+    info["HF_TOKEN_PATH"] = constants.HF_TOKEN_PATH
+    info["HF_STORED_TOKENS_PATH"] = constants.HF_STORED_TOKENS_PATH
+    info["HF_HUB_OFFLINE"] = constants.HF_HUB_OFFLINE
+    info["HF_HUB_DISABLE_TELEMETRY"] = constants.HF_HUB_DISABLE_TELEMETRY
+    info["HF_HUB_DISABLE_PROGRESS_BARS"] = constants.HF_HUB_DISABLE_PROGRESS_BARS
+    info["HF_HUB_DISABLE_SYMLINKS_WARNING"] = constants.HF_HUB_DISABLE_SYMLINKS_WARNING
+    info["HF_HUB_DISABLE_EXPERIMENTAL_WARNING"] = constants.HF_HUB_DISABLE_EXPERIMENTAL_WARNING
+    info["HF_HUB_DISABLE_IMPLICIT_TOKEN"] = constants.HF_HUB_DISABLE_IMPLICIT_TOKEN
+    info["HF_HUB_DISABLE_XET"] = constants.HF_HUB_DISABLE_XET
+    info["HF_HUB_ENABLE_HF_TRANSFER"] = constants.HF_HUB_ENABLE_HF_TRANSFER
+    info["HF_HUB_ETAG_TIMEOUT"] = constants.HF_HUB_ETAG_TIMEOUT
+    info["HF_HUB_DOWNLOAD_TIMEOUT"] = constants.HF_HUB_DOWNLOAD_TIMEOUT
+
+    print("\nCopy-and-paste the text below in your GitHub issue.\n")
+    print("\n".join([f"- {prop}: {val}" for prop, val in info.items()]) + "\n")
+    return info
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_safetensors.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_safetensors.py
new file mode 100644
index 0000000000000000000000000000000000000000..38546c6d34db786c62861e1706f747a21b7012bf
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_safetensors.py
@@ -0,0 +1,111 @@
+import functools
+import operator
+from collections import defaultdict
+from dataclasses import dataclass, field
+from typing import Dict, List, Literal, Optional, Tuple
+
+
+FILENAME_T = str
+TENSOR_NAME_T = str
+DTYPE_T = Literal["F64", "F32", "F16", "BF16", "I64", "I32", "I16", "I8", "U8", "BOOL"]
+
+
+@dataclass
+class TensorInfo:
+    """Information about a tensor.
+
+    For more details regarding the safetensors format, check out https://huggingface.co/docs/safetensors/index#format.
+
+    Attributes:
+        dtype (`str`):
+            The data type of the tensor ("F64", "F32", "F16", "BF16", "I64", "I32", "I16", "I8", "U8", "BOOL").
+        shape (`List[int]`):
+            The shape of the tensor.
+        data_offsets (`Tuple[int, int]`):
+            The offsets of the data in the file as a tuple `[BEGIN, END]`.
+        parameter_count (`int`):
+            The number of parameters in the tensor.
+    """
+
+    dtype: DTYPE_T
+    shape: List[int]
+    data_offsets: Tuple[int, int]
+    parameter_count: int = field(init=False)
+
+    def __post_init__(self) -> None:
+        # Taken from https://stackoverflow.com/a/13840436
+        try:
+            self.parameter_count = functools.reduce(operator.mul, self.shape)
+        except TypeError:
+            self.parameter_count = 1  # scalar value has no shape
+
+
+@dataclass
+class SafetensorsFileMetadata:
+    """Metadata for a Safetensors file hosted on the Hub.
+
+    This class is returned by [`parse_safetensors_file_metadata`].
+
+    For more details regarding the safetensors format, check out https://huggingface.co/docs/safetensors/index#format.
+
+    Attributes:
+        metadata (`Dict`):
+            The metadata contained in the file.
+        tensors (`Dict[str, TensorInfo]`):
+            A map of all tensors. Keys are tensor names and values are information about the corresponding tensor, as a
+            [`TensorInfo`] object.
+        parameter_count (`Dict[str, int]`):
+            A map of the number of parameters per data type. Keys are data types and values are the number of parameters
+            of that data type.
+    """
+
+    metadata: Dict[str, str]
+    tensors: Dict[TENSOR_NAME_T, TensorInfo]
+    parameter_count: Dict[DTYPE_T, int] = field(init=False)
+
+    def __post_init__(self) -> None:
+        parameter_count: Dict[DTYPE_T, int] = defaultdict(int)
+        for tensor in self.tensors.values():
+            parameter_count[tensor.dtype] += tensor.parameter_count
+        self.parameter_count = dict(parameter_count)
+
+
+@dataclass
+class SafetensorsRepoMetadata:
+    """Metadata for a Safetensors repo.
+
+    A repo is considered to be a Safetensors repo if it contains either a 'model.safetensors' weight file (non-shared
+    model) or a 'model.safetensors.index.json' index file (sharded model) at its root.
+
+    This class is returned by [`get_safetensors_metadata`].
+
+    For more details regarding the safetensors format, check out https://huggingface.co/docs/safetensors/index#format.
+
+    Attributes:
+        metadata (`Dict`, *optional*):
+            The metadata contained in the 'model.safetensors.index.json' file, if it exists. Only populated for sharded
+            models.
+        sharded (`bool`):
+            Whether the repo contains a sharded model or not.
+        weight_map (`Dict[str, str]`):
+            A map of all weights. Keys are tensor names and values are filenames of the files containing the tensors.
+        files_metadata (`Dict[str, SafetensorsFileMetadata]`):
+            A map of all files metadata. Keys are filenames and values are the metadata of the corresponding file, as
+            a [`SafetensorsFileMetadata`] object.
+        parameter_count (`Dict[str, int]`):
+            A map of the number of parameters per data type. Keys are data types and values are the number of parameters
+            of that data type.
+    """
+
+    metadata: Optional[Dict]
+    sharded: bool
+    weight_map: Dict[TENSOR_NAME_T, FILENAME_T]  # tensor name -> filename
+    files_metadata: Dict[FILENAME_T, SafetensorsFileMetadata]  # filename -> metadata
+    parameter_count: Dict[DTYPE_T, int] = field(init=False)
+
+    def __post_init__(self) -> None:
+        parameter_count: Dict[DTYPE_T, int] = defaultdict(int)
+        for file_metadata in self.files_metadata.values():
+            for dtype, nb_parameters_ in file_metadata.parameter_count.items():
+                parameter_count[dtype] += nb_parameters_
+        self.parameter_count = dict(parameter_count)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_subprocess.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_subprocess.py
new file mode 100644
index 0000000000000000000000000000000000000000..fdabf1c4df3b61dc610ae08eb7842df6af3552f3
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_subprocess.py
@@ -0,0 +1,144 @@
+# coding=utf-8
+# Copyright 2021 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License
+"""Contains utilities to easily handle subprocesses in `huggingface_hub`."""
+
+import os
+import subprocess
+import sys
+from contextlib import contextmanager
+from io import StringIO
+from pathlib import Path
+from typing import IO, Generator, List, Optional, Tuple, Union
+
+from .logging import get_logger
+
+
+logger = get_logger(__name__)
+
+
+@contextmanager
+def capture_output() -> Generator[StringIO, None, None]:
+    """Capture output that is printed to terminal.
+
+    Taken from https://stackoverflow.com/a/34738440
+
+    Example:
+    ```py
+    >>> with capture_output() as output:
+    ...     print("hello world")
+    >>> assert output.getvalue() == "hello world\n"
+    ```
+    """
+    output = StringIO()
+    previous_output = sys.stdout
+    sys.stdout = output
+    try:
+        yield output
+    finally:
+        sys.stdout = previous_output
+
+
+def run_subprocess(
+    command: Union[str, List[str]],
+    folder: Optional[Union[str, Path]] = None,
+    check=True,
+    **kwargs,
+) -> subprocess.CompletedProcess:
+    """
+    Method to run subprocesses. Calling this will capture the `stderr` and `stdout`,
+    please call `subprocess.run` manually in case you would like for them not to
+    be captured.
+
+    Args:
+        command (`str` or `List[str]`):
+            The command to execute as a string or list of strings.
+        folder (`str`, *optional*):
+            The folder in which to run the command. Defaults to current working
+            directory (from `os.getcwd()`).
+        check (`bool`, *optional*, defaults to `True`):
+            Setting `check` to `True` will raise a `subprocess.CalledProcessError`
+            when the subprocess has a non-zero exit code.
+        kwargs (`Dict[str]`):
+            Keyword arguments to be passed to the `subprocess.run` underlying command.
+
+    Returns:
+        `subprocess.CompletedProcess`: The completed process.
+    """
+    if isinstance(command, str):
+        command = command.split()
+
+    if isinstance(folder, Path):
+        folder = str(folder)
+
+    return subprocess.run(
+        command,
+        stderr=subprocess.PIPE,
+        stdout=subprocess.PIPE,
+        check=check,
+        encoding="utf-8",
+        errors="replace",  # if not utf-8, replace char by �
+        cwd=folder or os.getcwd(),
+        **kwargs,
+    )
+
+
+@contextmanager
+def run_interactive_subprocess(
+    command: Union[str, List[str]],
+    folder: Optional[Union[str, Path]] = None,
+    **kwargs,
+) -> Generator[Tuple[IO[str], IO[str]], None, None]:
+    """Run a subprocess in an interactive mode in a context manager.
+
+    Args:
+        command (`str` or `List[str]`):
+            The command to execute as a string or list of strings.
+        folder (`str`, *optional*):
+            The folder in which to run the command. Defaults to current working
+            directory (from `os.getcwd()`).
+        kwargs (`Dict[str]`):
+            Keyword arguments to be passed to the `subprocess.run` underlying command.
+
+    Returns:
+        `Tuple[IO[str], IO[str]]`: A tuple with `stdin` and `stdout` to interact
+        with the process (input and output are utf-8 encoded).
+
+    Example:
+    ```python
+    with _interactive_subprocess("git credential-store get") as (stdin, stdout):
+        # Write to stdin
+        stdin.write("url=hf.co\nusername=obama\n".encode("utf-8"))
+        stdin.flush()
+
+        # Read from stdout
+        output = stdout.read().decode("utf-8")
+    ```
+    """
+    if isinstance(command, str):
+        command = command.split()
+
+    with subprocess.Popen(
+        command,
+        stdin=subprocess.PIPE,
+        stdout=subprocess.PIPE,
+        stderr=subprocess.STDOUT,
+        encoding="utf-8",
+        errors="replace",  # if not utf-8, replace char by �
+        cwd=folder or os.getcwd(),
+        **kwargs,
+    ) as process:
+        assert process.stdin is not None, "subprocess is opened as subprocess.PIPE"
+        assert process.stdout is not None, "subprocess is opened as subprocess.PIPE"
+        yield process.stdin, process.stdout
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_telemetry.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_telemetry.py
new file mode 100644
index 0000000000000000000000000000000000000000..2ba4a6349a8de1c565263ec73d235d36f88b68cf
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_telemetry.py
@@ -0,0 +1,126 @@
+from queue import Queue
+from threading import Lock, Thread
+from typing import Dict, Optional, Union
+from urllib.parse import quote
+
+from .. import constants, logging
+from . import build_hf_headers, get_session, hf_raise_for_status
+
+
+logger = logging.get_logger(__name__)
+
+# Telemetry is sent by a separate thread to avoid blocking the main thread.
+# A daemon thread is started once and consume tasks from the _TELEMETRY_QUEUE.
+# If the thread stops for some reason -shouldn't happen-, we restart a new one.
+_TELEMETRY_THREAD: Optional[Thread] = None
+_TELEMETRY_THREAD_LOCK = Lock()  # Lock to avoid starting multiple threads in parallel
+_TELEMETRY_QUEUE: Queue = Queue()
+
+
+def send_telemetry(
+    topic: str,
+    *,
+    library_name: Optional[str] = None,
+    library_version: Optional[str] = None,
+    user_agent: Union[Dict, str, None] = None,
+) -> None:
+    """
+    Sends telemetry that helps tracking usage of different HF libraries.
+
+    This usage data helps us debug issues and prioritize new features. However, we understand that not everyone wants
+    to share additional information, and we respect your privacy. You can disable telemetry collection by setting the
+    `HF_HUB_DISABLE_TELEMETRY=1` as environment variable. Telemetry is also disabled in offline mode (i.e. when setting
+    `HF_HUB_OFFLINE=1`).
+
+    Telemetry collection is run in a separate thread to minimize impact for the user.
+
+    Args:
+        topic (`str`):
+            Name of the topic that is monitored. The topic is directly used to build the URL. If you want to monitor
+            subtopics, just use "/" separation. Examples: "gradio", "transformers/examples",...
+        library_name (`str`, *optional*):
+            The name of the library that is making the HTTP request. Will be added to the user-agent header.
+        library_version (`str`, *optional*):
+            The version of the library that is making the HTTP request. Will be added to the user-agent header.
+        user_agent (`str`, `dict`, *optional*):
+            The user agent info in the form of a dictionary or a single string. It will be completed with information about the installed packages.
+
+    Example:
+    ```py
+    >>> from huggingface_hub.utils import send_telemetry
+
+    # Send telemetry without library information
+    >>> send_telemetry("ping")
+
+    # Send telemetry to subtopic with library information
+    >>> send_telemetry("gradio/local_link", library_name="gradio", library_version="3.22.1")
+
+    # Send telemetry with additional data
+    >>> send_telemetry(
+    ...     topic="examples",
+    ...     library_name="transformers",
+    ...     library_version="4.26.0",
+    ...     user_agent={"pipeline": "text_classification", "framework": "flax"},
+    ... )
+    ```
+    """
+    if constants.HF_HUB_OFFLINE or constants.HF_HUB_DISABLE_TELEMETRY:
+        return
+
+    _start_telemetry_thread()  # starts thread only if doesn't exist yet
+    _TELEMETRY_QUEUE.put(
+        {"topic": topic, "library_name": library_name, "library_version": library_version, "user_agent": user_agent}
+    )
+
+
+def _start_telemetry_thread():
+    """Start a daemon thread to consume tasks from the telemetry queue.
+
+    If the thread is interrupted, start a new one.
+    """
+    with _TELEMETRY_THREAD_LOCK:  # avoid to start multiple threads if called concurrently
+        global _TELEMETRY_THREAD
+        if _TELEMETRY_THREAD is None or not _TELEMETRY_THREAD.is_alive():
+            _TELEMETRY_THREAD = Thread(target=_telemetry_worker, daemon=True)
+            _TELEMETRY_THREAD.start()
+
+
+def _telemetry_worker():
+    """Wait for a task and consume it."""
+    while True:
+        kwargs = _TELEMETRY_QUEUE.get()
+        _send_telemetry_in_thread(**kwargs)
+        _TELEMETRY_QUEUE.task_done()
+
+
+def _send_telemetry_in_thread(
+    topic: str,
+    *,
+    library_name: Optional[str] = None,
+    library_version: Optional[str] = None,
+    user_agent: Union[Dict, str, None] = None,
+) -> None:
+    """Contains the actual data sending data to the Hub.
+
+    This function is called directly in gradio's analytics because
+    it is not possible to send telemetry from a daemon thread.
+
+    See here: https://github.com/gradio-app/gradio/pull/8180
+
+    Please do not rename or remove this function.
+    """
+    path = "/".join(quote(part) for part in topic.split("/") if len(part) > 0)
+    try:
+        r = get_session().head(
+            f"{constants.ENDPOINT}/api/telemetry/{path}",
+            headers=build_hf_headers(
+                token=False,  # no need to send a token for telemetry
+                library_name=library_name,
+                library_version=library_version,
+                user_agent=user_agent,
+            ),
+        )
+        hf_raise_for_status(r)
+    except Exception as e:
+        # We don't want to error in case of connection errors of any kind.
+        logger.debug(f"Error while sending telemetry: {e}")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_typing.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_typing.py
new file mode 100644
index 0000000000000000000000000000000000000000..8c5d6381a2a73afa08698bb99193f1774fb02f64
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_typing.py
@@ -0,0 +1,95 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Handle typing imports based on system compatibility."""
+
+import sys
+from typing import Any, Callable, List, Literal, Optional, Set, Type, TypeVar, Union, get_args, get_origin
+
+
+UNION_TYPES: List[Any] = [Union]
+if sys.version_info >= (3, 10):
+    from types import UnionType
+
+    UNION_TYPES += [UnionType]
+
+
+HTTP_METHOD_T = Literal["GET", "OPTIONS", "HEAD", "POST", "PUT", "PATCH", "DELETE"]
+
+# type hint meaning "function signature not changed by decorator"
+CallableT = TypeVar("CallableT", bound=Callable)
+
+_JSON_SERIALIZABLE_TYPES = (int, float, str, bool, type(None))
+
+
+def is_jsonable(obj: Any, _visited: Optional[Set[int]] = None) -> bool:
+    """Check if an object is JSON serializable.
+
+    This is a weak check, as it does not check for the actual JSON serialization, but only for the types of the object.
+    It works correctly for basic use cases but do not guarantee an exhaustive check.
+
+    Object is considered to be recursively json serializable if:
+    - it is an instance of int, float, str, bool, or NoneType
+    - it is a list or tuple and all its items are json serializable
+    - it is a dict and all its keys are strings and all its values are json serializable
+
+    Uses a visited set to avoid infinite recursion on circular references. If object has already been visited, it is
+    considered not json serializable.
+    """
+    # Initialize visited set to track object ids and detect circular references
+    if _visited is None:
+        _visited = set()
+
+    # Detect circular reference
+    obj_id = id(obj)
+    if obj_id in _visited:
+        return False
+
+    # Add current object to visited before recursive checks
+    _visited.add(obj_id)
+    try:
+        if isinstance(obj, _JSON_SERIALIZABLE_TYPES):
+            return True
+        if isinstance(obj, (list, tuple)):
+            return all(is_jsonable(item, _visited) for item in obj)
+        if isinstance(obj, dict):
+            return all(
+                isinstance(key, _JSON_SERIALIZABLE_TYPES) and is_jsonable(value, _visited)
+                for key, value in obj.items()
+            )
+        if hasattr(obj, "__json__"):
+            return True
+        return False
+    except RecursionError:
+        return False
+    finally:
+        # Remove the object id from visited to avoid side‑effects for other branches
+        _visited.discard(obj_id)
+
+
+def is_simple_optional_type(type_: Type) -> bool:
+    """Check if a type is optional, i.e. Optional[Type] or Union[Type, None] or Type | None, where Type is a non-composite type."""
+    if get_origin(type_) in UNION_TYPES:
+        union_args = get_args(type_)
+        if len(union_args) == 2 and type(None) in union_args:
+            return True
+    return False
+
+
+def unwrap_simple_optional_type(optional_type: Type) -> Type:
+    """Unwraps a simple optional type, i.e. returns Type from Optional[Type]."""
+    for arg in get_args(optional_type):
+        if arg is not type(None):
+            return arg
+    raise ValueError(f"'{optional_type}' is not an optional type")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py
new file mode 100644
index 0000000000000000000000000000000000000000..4bc219611b2132d699643975d00cf99853e03e47
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py
@@ -0,0 +1,226 @@
+# coding=utf-8
+# Copyright 2022-present, the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Contains utilities to validate argument values in `huggingface_hub`."""
+
+import inspect
+import re
+import warnings
+from functools import wraps
+from itertools import chain
+from typing import Any, Dict
+
+from huggingface_hub.errors import HFValidationError
+
+from ._typing import CallableT
+
+
+REPO_ID_REGEX = re.compile(
+    r"""
+    ^
+    (\b[\w\-.]+\b/)? # optional namespace (username or organization)
+    \b               # starts with a word boundary
+    [\w\-.]{1,96}    # repo_name: alphanumeric + . _ -
+    \b               # ends with a word boundary
+    $
+    """,
+    flags=re.VERBOSE,
+)
+
+
+def validate_hf_hub_args(fn: CallableT) -> CallableT:
+    """Validate values received as argument for any public method of `huggingface_hub`.
+
+    The goal of this decorator is to harmonize validation of arguments reused
+    everywhere. By default, all defined validators are tested.
+
+    Validators:
+        - [`~utils.validate_repo_id`]: `repo_id` must be `"repo_name"`
+          or `"namespace/repo_name"`. Namespace is a username or an organization.
+        - [`~utils.smoothly_deprecate_use_auth_token`]: Use `token` instead of
+          `use_auth_token` (only if `use_auth_token` is not expected by the decorated
+          function - in practice, always the case in `huggingface_hub`).
+
+    Example:
+    ```py
+    >>> from huggingface_hub.utils import validate_hf_hub_args
+
+    >>> @validate_hf_hub_args
+    ... def my_cool_method(repo_id: str):
+    ...     print(repo_id)
+
+    >>> my_cool_method(repo_id="valid_repo_id")
+    valid_repo_id
+
+    >>> my_cool_method("other..repo..id")
+    huggingface_hub.utils._validators.HFValidationError: Cannot have -- or .. in repo_id: 'other..repo..id'.
+
+    >>> my_cool_method(repo_id="other..repo..id")
+    huggingface_hub.utils._validators.HFValidationError: Cannot have -- or .. in repo_id: 'other..repo..id'.
+
+    >>> @validate_hf_hub_args
+    ... def my_cool_auth_method(token: str):
+    ...     print(token)
+
+    >>> my_cool_auth_method(token="a token")
+    "a token"
+
+    >>> my_cool_auth_method(use_auth_token="a use_auth_token")
+    "a use_auth_token"
+
+    >>> my_cool_auth_method(token="a token", use_auth_token="a use_auth_token")
+    UserWarning: Both `token` and `use_auth_token` are passed (...)
+    "a token"
+    ```
+
+    Raises:
+        [`~utils.HFValidationError`]:
+            If an input is not valid.
+    """
+    # TODO: add an argument to opt-out validation for specific argument?
+    signature = inspect.signature(fn)
+
+    # Should the validator switch `use_auth_token` values to `token`? In practice, always
+    # True in `huggingface_hub`. Might not be the case in a downstream library.
+    check_use_auth_token = "use_auth_token" not in signature.parameters and "token" in signature.parameters
+
+    @wraps(fn)
+    def _inner_fn(*args, **kwargs):
+        has_token = False
+        for arg_name, arg_value in chain(
+            zip(signature.parameters, args),  # Args values
+            kwargs.items(),  # Kwargs values
+        ):
+            if arg_name in ["repo_id", "from_id", "to_id"]:
+                validate_repo_id(arg_value)
+
+            elif arg_name == "token" and arg_value is not None:
+                has_token = True
+
+        if check_use_auth_token:
+            kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
+
+        return fn(*args, **kwargs)
+
+    return _inner_fn  # type: ignore
+
+
+def validate_repo_id(repo_id: str) -> None:
+    """Validate `repo_id` is valid.
+
+    This is not meant to replace the proper validation made on the Hub but rather to
+    avoid local inconsistencies whenever possible (example: passing `repo_type` in the
+    `repo_id` is forbidden).
+
+    Rules:
+    - Between 1 and 96 characters.
+    - Either "repo_name" or "namespace/repo_name"
+    - [a-zA-Z0-9] or "-", "_", "."
+    - "--" and ".." are forbidden
+
+    Valid: `"foo"`, `"foo/bar"`, `"123"`, `"Foo-BAR_foo.bar123"`
+
+    Not valid: `"datasets/foo/bar"`, `".repo_id"`, `"foo--bar"`, `"foo.git"`
+
+    Example:
+    ```py
+    >>> from huggingface_hub.utils import validate_repo_id
+    >>> validate_repo_id(repo_id="valid_repo_id")
+    >>> validate_repo_id(repo_id="other..repo..id")
+    huggingface_hub.utils._validators.HFValidationError: Cannot have -- or .. in repo_id: 'other..repo..id'.
+    ```
+
+    Discussed in https://github.com/huggingface/huggingface_hub/issues/1008.
+    In moon-landing (internal repository):
+    - https://github.com/huggingface/moon-landing/blob/main/server/lib/Names.ts#L27
+    - https://github.com/huggingface/moon-landing/blob/main/server/views/components/NewRepoForm/NewRepoForm.svelte#L138
+    """
+    if not isinstance(repo_id, str):
+        # Typically, a Path is not a repo_id
+        raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_id}'.")
+
+    if repo_id.count("/") > 1:
+        raise HFValidationError(
+            "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
+            f" '{repo_id}'. Use `repo_type` argument if needed."
+        )
+
+    if not REPO_ID_REGEX.match(repo_id):
+        raise HFValidationError(
+            "Repo id must use alphanumeric chars, '-', '_' or '.'."
+            " The name cannot start or end with '-' or '.' and the maximum length is 96:"
+            f" '{repo_id}'."
+        )
+
+    if "--" in repo_id or ".." in repo_id:
+        raise HFValidationError(f"Cannot have -- or .. in repo_id: '{repo_id}'.")
+
+    if repo_id.endswith(".git"):
+        raise HFValidationError(f"Repo_id cannot end by '.git': '{repo_id}'.")
+
+
+def smoothly_deprecate_use_auth_token(fn_name: str, has_token: bool, kwargs: Dict[str, Any]) -> Dict[str, Any]:
+    """Smoothly deprecate `use_auth_token` in the `huggingface_hub` codebase.
+
+    The long-term goal is to remove any mention of `use_auth_token` in the codebase in
+    favor of a unique and less verbose `token` argument. This will be done a few steps:
+
+    0. Step 0: methods that require a read-access to the Hub use the `use_auth_token`
+       argument (`str`, `bool` or `None`). Methods requiring write-access have a `token`
+       argument (`str`, `None`). This implicit rule exists to be able to not send the
+       token when not necessary (`use_auth_token=False`) even if logged in.
+
+    1. Step 1: we want to harmonize everything and use `token` everywhere (supporting
+       `token=False` for read-only methods). In order not to break existing code, if
+       `use_auth_token` is passed to a function, the `use_auth_token` value is passed
+       as `token` instead, without any warning.
+       a. Corner case: if both `use_auth_token` and `token` values are passed, a warning
+          is thrown and the `use_auth_token` value is ignored.
+
+    2. Step 2: Once it is release, we should push downstream libraries to switch from
+       `use_auth_token` to `token` as much as possible, but without throwing a warning
+       (e.g. manually create issues on the corresponding repos).
+
+    3. Step 3: After a transitional period (6 months e.g. until April 2023?), we update
+       `huggingface_hub` to throw a warning on `use_auth_token`. Hopefully, very few
+       users will be impacted as it would have already been fixed.
+       In addition, unit tests in `huggingface_hub` must be adapted to expect warnings
+       to be thrown (but still use `use_auth_token` as before).
+
+    4. Step 4: After a normal deprecation cycle (3 releases ?), remove this validator.
+       `use_auth_token` will definitely not be supported.
+       In addition, we update unit tests in `huggingface_hub` to use `token` everywhere.
+
+    This has been discussed in:
+    - https://github.com/huggingface/huggingface_hub/issues/1094.
+    - https://github.com/huggingface/huggingface_hub/pull/928
+    - (related) https://github.com/huggingface/huggingface_hub/pull/1064
+    """
+    new_kwargs = kwargs.copy()  # do not mutate input !
+
+    use_auth_token = new_kwargs.pop("use_auth_token", None)  # remove from kwargs
+    if use_auth_token is not None:
+        if has_token:
+            warnings.warn(
+                "Both `token` and `use_auth_token` are passed to"
+                f" `{fn_name}` with non-None values. `token` is now the"
+                " preferred argument to pass a User Access Token."
+                " `use_auth_token` value will be ignored."
+            )
+        else:
+            # `token` argument is not passed and a non-None value is passed in
+            # `use_auth_token` => use `use_auth_token` value as `token` kwarg.
+            new_kwargs["token"] = use_auth_token
+
+    return new_kwargs
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_xet.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_xet.py
new file mode 100644
index 0000000000000000000000000000000000000000..3dcf99068f87eebdf3c684edf6026a576bd34eaf
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_xet.py
@@ -0,0 +1,192 @@
+from dataclasses import dataclass
+from enum import Enum
+from typing import Dict, Optional
+
+import requests
+
+from .. import constants
+from . import get_session, hf_raise_for_status, validate_hf_hub_args
+
+
+class XetTokenType(str, Enum):
+    READ = "read"
+    WRITE = "write"
+
+
+@dataclass(frozen=True)
+class XetFileData:
+    file_hash: str
+    refresh_route: str
+
+
+@dataclass(frozen=True)
+class XetConnectionInfo:
+    access_token: str
+    expiration_unix_epoch: int
+    endpoint: str
+
+
+def parse_xet_file_data_from_response(
+    response: requests.Response, endpoint: Optional[str] = None
+) -> Optional[XetFileData]:
+    """
+    Parse XET file metadata from an HTTP response.
+
+    This function extracts XET file metadata from the HTTP headers or HTTP links
+    of a given response object. If the required metadata is not found, it returns `None`.
+
+    Args:
+        response (`requests.Response`):
+            The HTTP response object containing headers dict and links dict to extract the XET metadata from.
+    Returns:
+        `Optional[XetFileData]`:
+            An instance of `XetFileData` containing the file hash and refresh route if the metadata
+            is found. Returns `None` if the required metadata is missing.
+    """
+    if response is None:
+        return None
+    try:
+        file_hash = response.headers[constants.HUGGINGFACE_HEADER_X_XET_HASH]
+
+        if constants.HUGGINGFACE_HEADER_LINK_XET_AUTH_KEY in response.links:
+            refresh_route = response.links[constants.HUGGINGFACE_HEADER_LINK_XET_AUTH_KEY]["url"]
+        else:
+            refresh_route = response.headers[constants.HUGGINGFACE_HEADER_X_XET_REFRESH_ROUTE]
+    except KeyError:
+        return None
+    endpoint = endpoint if endpoint is not None else constants.ENDPOINT
+    if refresh_route.startswith(constants.HUGGINGFACE_CO_URL_HOME):
+        refresh_route = refresh_route.replace(constants.HUGGINGFACE_CO_URL_HOME.rstrip("/"), endpoint.rstrip("/"))
+    return XetFileData(
+        file_hash=file_hash,
+        refresh_route=refresh_route,
+    )
+
+
+def parse_xet_connection_info_from_headers(headers: Dict[str, str]) -> Optional[XetConnectionInfo]:
+    """
+    Parse XET connection info from the HTTP headers or return None if not found.
+    Args:
+        headers (`Dict`):
+           HTTP headers to extract the XET metadata from.
+    Returns:
+        `XetConnectionInfo` or `None`:
+            The information needed to connect to the XET storage service.
+            Returns `None` if the headers do not contain the XET connection info.
+    """
+    try:
+        endpoint = headers[constants.HUGGINGFACE_HEADER_X_XET_ENDPOINT]
+        access_token = headers[constants.HUGGINGFACE_HEADER_X_XET_ACCESS_TOKEN]
+        expiration_unix_epoch = int(headers[constants.HUGGINGFACE_HEADER_X_XET_EXPIRATION])
+    except (KeyError, ValueError, TypeError):
+        return None
+
+    return XetConnectionInfo(
+        endpoint=endpoint,
+        access_token=access_token,
+        expiration_unix_epoch=expiration_unix_epoch,
+    )
+
+
+@validate_hf_hub_args
+def refresh_xet_connection_info(
+    *,
+    file_data: XetFileData,
+    headers: Dict[str, str],
+) -> XetConnectionInfo:
+    """
+    Utilizes the information in the parsed metadata to request the Hub xet connection information.
+    This includes the access token, expiration, and XET service URL.
+    Args:
+        file_data: (`XetFileData`):
+            The file data needed to refresh the xet connection information.
+        headers (`Dict[str, str]`):
+            Headers to use for the request, including authorization headers and user agent.
+    Returns:
+        `XetConnectionInfo`:
+            The connection information needed to make the request to the xet storage service.
+    Raises:
+        [`~utils.HfHubHTTPError`]
+            If the Hub API returned an error.
+        [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
+            If the Hub API response is improperly formatted.
+    """
+    if file_data.refresh_route is None:
+        raise ValueError("The provided xet metadata does not contain a refresh endpoint.")
+    return _fetch_xet_connection_info_with_url(file_data.refresh_route, headers)
+
+
+@validate_hf_hub_args
+def fetch_xet_connection_info_from_repo_info(
+    *,
+    token_type: XetTokenType,
+    repo_id: str,
+    repo_type: str,
+    revision: Optional[str] = None,
+    headers: Dict[str, str],
+    endpoint: Optional[str] = None,
+    params: Optional[Dict[str, str]] = None,
+) -> XetConnectionInfo:
+    """
+    Uses the repo info to request a xet access token from Hub.
+    Args:
+        token_type (`XetTokenType`):
+            Type of the token to request: `"read"` or `"write"`.
+        repo_id (`str`):
+            A namespace (user or an organization) and a repo name separated by a `/`.
+        repo_type (`str`):
+            Type of the repo to upload to: `"model"`, `"dataset"` or `"space"`.
+        revision (`str`, `optional`):
+            The revision of the repo to get the token for.
+        headers (`Dict[str, str]`):
+            Headers to use for the request, including authorization headers and user agent.
+        endpoint (`str`, `optional`):
+            The endpoint to use for the request. Defaults to the Hub endpoint.
+        params (`Dict[str, str]`, `optional`):
+            Additional parameters to pass with the request.
+    Returns:
+        `XetConnectionInfo`:
+            The connection information needed to make the request to the xet storage service.
+    Raises:
+        [`~utils.HfHubHTTPError`]
+            If the Hub API returned an error.
+        [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
+            If the Hub API response is improperly formatted.
+    """
+    endpoint = endpoint if endpoint is not None else constants.ENDPOINT
+    url = f"{endpoint}/api/{repo_type}s/{repo_id}/xet-{token_type.value}-token/{revision}"
+    return _fetch_xet_connection_info_with_url(url, headers, params)
+
+
+@validate_hf_hub_args
+def _fetch_xet_connection_info_with_url(
+    url: str,
+    headers: Dict[str, str],
+    params: Optional[Dict[str, str]] = None,
+) -> XetConnectionInfo:
+    """
+    Requests the xet connection info from the supplied URL. This includes the
+    access token, expiration time, and endpoint to use for the xet storage service.
+    Args:
+        url: (`str`):
+            The access token endpoint URL.
+        headers (`Dict[str, str]`):
+            Headers to use for the request, including authorization headers and user agent.
+        params (`Dict[str, str]`, `optional`):
+            Additional parameters to pass with the request.
+    Returns:
+        `XetConnectionInfo`:
+            The connection information needed to make the request to the xet storage service.
+    Raises:
+        [`~utils.HfHubHTTPError`]
+            If the Hub API returned an error.
+        [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError)
+            If the Hub API response is improperly formatted.
+    """
+    resp = get_session().get(headers=headers, url=url, params=params)
+    hf_raise_for_status(resp)
+
+    metadata = parse_xet_connection_info_from_headers(resp.headers)  # type: ignore
+    if metadata is None:
+        raise ValueError("Xet headers have not been correctly set by the server.")
+    return metadata
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_xet_progress_reporting.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_xet_progress_reporting.py
new file mode 100644
index 0000000000000000000000000000000000000000..e47740d5c5ea27253debc9e29c1eae9d10a6034f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/_xet_progress_reporting.py
@@ -0,0 +1,162 @@
+from collections import OrderedDict
+from typing import List
+
+from hf_xet import PyItemProgressUpdate, PyTotalProgressUpdate
+
+from . import is_google_colab, is_notebook
+from .tqdm import tqdm
+
+
+class XetProgressReporter:
+    """
+    Reports on progress for Xet uploads.
+
+    Shows summary progress bars when running in notebooks or GUIs, and detailed per-file progress in console environments.
+    """
+
+    def __init__(self, n_lines: int = 10, description_width: int = 30):
+        self.n_lines = n_lines
+        self.description_width = description_width
+
+        self.per_file_progress = is_google_colab() or not is_notebook()
+
+        self.tqdm_settings = {
+            "unit": "B",
+            "unit_scale": True,
+            "leave": True,
+            "unit_divisor": 1000,
+            "nrows": n_lines + 3 if self.per_file_progress else 3,
+            "miniters": 1,
+            "bar_format": "{l_bar}{bar}| {n_fmt:>5}B / {total_fmt:>5}B{postfix:>12}",
+        }
+
+        # Overall progress bars
+        self.data_processing_bar = tqdm(
+            total=0, desc=self.format_desc("Processing Files (0 / 0)", False), position=0, **self.tqdm_settings
+        )
+
+        self.upload_bar = tqdm(
+            total=0, desc=self.format_desc("New Data Upload", False), position=1, **self.tqdm_settings
+        )
+
+        self.known_items: set[str] = set()
+        self.completed_items: set[str] = set()
+
+        # Item bars (scrolling view)
+        self.item_state: OrderedDict[str, PyItemProgressUpdate] = OrderedDict()
+        self.current_bars: List = [None] * self.n_lines
+
+    def format_desc(self, name: str, indent: bool) -> str:
+        """
+        if name is longer than width characters, prints ... at the start and then the last width-3 characters of the name, otherwise
+        the whole name right justified into description_width characters.  Also adds some padding.
+        """
+
+        if not self.per_file_progress:
+            # Here we just use the defaults.
+            return name
+
+        padding = "  " if indent else ""
+        width = self.description_width - len(padding)
+
+        if len(name) > width:
+            name = f"...{name[-(width - 3) :]}"
+
+        return f"{padding}{name.ljust(width)}"
+
+    def update_progress(self, total_update: PyTotalProgressUpdate, item_updates: List[PyItemProgressUpdate]):
+        # Update all the per-item values.
+        for item in item_updates:
+            item_name = item.item_name
+
+            self.known_items.add(item_name)
+
+            # Only care about items where the processing has already started.
+            if item.bytes_completed == 0:
+                continue
+
+            # Overwrite the existing value in there.
+            self.item_state[item_name] = item
+
+        bar_idx = 0
+        new_completed = []
+
+        # Now, go through and update all the bars
+        for name, item in self.item_state.items():
+            # Is this ready to be removed on the next update?
+            if item.bytes_completed == item.total_bytes:
+                self.completed_items.add(name)
+                new_completed.append(name)
+
+            # If we're only showing summary information, then don't update the individual bars
+            if not self.per_file_progress:
+                continue
+
+            # If we've run out of bars to use, then collapse the last ones together.
+            if bar_idx >= len(self.current_bars):
+                bar = self.current_bars[-1]
+                in_final_bar_mode = True
+                final_bar_aggregation_count = bar_idx + 1 - len(self.current_bars)
+            else:
+                bar = self.current_bars[bar_idx]
+                in_final_bar_mode = False
+
+            if bar is None:
+                self.current_bars[bar_idx] = tqdm(
+                    desc=self.format_desc(name, True),
+                    position=2 + bar_idx,  # Set to the position past the initial bars.
+                    total=item.total_bytes,
+                    initial=item.bytes_completed,
+                    **self.tqdm_settings,
+                )
+
+            elif in_final_bar_mode:
+                bar.n += item.bytes_completed
+                bar.total += item.total_bytes
+                bar.set_description(self.format_desc(f"[+ {final_bar_aggregation_count} files]", True), refresh=False)
+            else:
+                bar.set_description(self.format_desc(name, True), refresh=False)
+                bar.n = item.bytes_completed
+                bar.total = item.total_bytes
+
+            bar_idx += 1
+
+        # Remove all the completed ones from the ordered dictionary
+        for name in new_completed:
+            # Only remove ones from consideration to make room for more items coming in.
+            if len(self.item_state) <= self.n_lines:
+                break
+
+            del self.item_state[name]
+
+        if self.per_file_progress:
+            # Now manually refresh each of the bars
+            for bar in self.current_bars:
+                if bar:
+                    bar.refresh()
+
+        # Update overall bars
+        def postfix(speed):
+            s = tqdm.format_sizeof(speed) if speed is not None else "???"
+            return f"{s}B/s  ".rjust(10, " ")
+
+        self.data_processing_bar.total = total_update.total_bytes
+        self.data_processing_bar.set_description(
+            self.format_desc(f"Processing Files ({len(self.completed_items)} / {len(self.known_items)})", False),
+            refresh=False,
+        )
+        self.data_processing_bar.set_postfix_str(postfix(total_update.total_bytes_completion_rate), refresh=False)
+        self.data_processing_bar.update(total_update.total_bytes_completion_increment)
+
+        self.upload_bar.total = total_update.total_transfer_bytes
+        self.upload_bar.set_postfix_str(postfix(total_update.total_transfer_bytes_completion_rate), refresh=False)
+        self.upload_bar.update(total_update.total_transfer_bytes_completion_increment)
+
+    def close(self, _success):
+        self.data_processing_bar.close()
+        self.upload_bar.close()
+
+        if self.per_file_progress:
+            for bar in self.current_bars:
+                if bar:
+                    bar.close()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/endpoint_helpers.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/endpoint_helpers.py
new file mode 100644
index 0000000000000000000000000000000000000000..85cd86011b78bcdc57034aeebc3c01e9e721ab50
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/endpoint_helpers.py
@@ -0,0 +1,66 @@
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Helpful utility functions and classes in relation to exploring API endpoints
+with the aim for a user-friendly interface.
+"""
+
+import math
+import re
+from typing import TYPE_CHECKING
+
+from ..repocard_data import ModelCardData
+
+
+if TYPE_CHECKING:
+    from ..hf_api import ModelInfo
+
+
+def _is_emission_within_threshold(model_info: "ModelInfo", minimum_threshold: float, maximum_threshold: float) -> bool:
+    """Checks if a model's emission is within a given threshold.
+
+    Args:
+        model_info (`ModelInfo`):
+            A model info object containing the model's emission information.
+        minimum_threshold (`float`):
+            A minimum carbon threshold to filter by, such as 1.
+        maximum_threshold (`float`):
+            A maximum carbon threshold to filter by, such as 10.
+
+    Returns:
+        `bool`: Whether the model's emission is within the given threshold.
+    """
+    if minimum_threshold is None and maximum_threshold is None:
+        raise ValueError("Both `minimum_threshold` and `maximum_threshold` cannot both be `None`")
+    if minimum_threshold is None:
+        minimum_threshold = -1
+    if maximum_threshold is None:
+        maximum_threshold = math.inf
+
+    card_data = getattr(model_info, "card_data", None)
+    if card_data is None or not isinstance(card_data, (dict, ModelCardData)):
+        return False
+
+    # Get CO2 emission metadata
+    emission = card_data.get("co2_eq_emissions", None)
+    if isinstance(emission, dict):
+        emission = emission["emissions"]
+    if not emission:
+        return False
+
+    # Filter out if value is missing or out of range
+    matched = re.search(r"\d+\.\d+|\d+", str(emission))
+    if matched is None:
+        return False
+
+    emission_value = float(matched.group(0))
+    return minimum_threshold <= emission_value <= maximum_threshold
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/insecure_hashlib.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/insecure_hashlib.py
new file mode 100644
index 0000000000000000000000000000000000000000..6901b6d647cc706b85333a66f3bcb7d8c5e2ee9e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/insecure_hashlib.py
@@ -0,0 +1,38 @@
+# Taken from https://github.com/mlflow/mlflow/pull/10119
+#
+# DO NOT use this function for security purposes (e.g., password hashing).
+#
+# In Python >= 3.9, insecure hashing algorithms such as MD5 fail in FIPS-compliant
+# environments unless `usedforsecurity=False` is explicitly passed.
+#
+# References:
+# - https://github.com/mlflow/mlflow/issues/9905
+# - https://github.com/mlflow/mlflow/pull/10119
+# - https://docs.python.org/3/library/hashlib.html
+# - https://github.com/huggingface/transformers/pull/27038
+#
+# Usage:
+#     ```python
+#     # Use
+#     from huggingface_hub.utils.insecure_hashlib import sha256
+#     # instead of
+#     from hashlib import sha256
+#
+#     # Use
+#     from huggingface_hub.utils import insecure_hashlib
+#     # instead of
+#     import hashlib
+#     ```
+import functools
+import hashlib
+import sys
+
+
+if sys.version_info >= (3, 9):
+    md5 = functools.partial(hashlib.md5, usedforsecurity=False)
+    sha1 = functools.partial(hashlib.sha1, usedforsecurity=False)
+    sha256 = functools.partial(hashlib.sha256, usedforsecurity=False)
+else:
+    md5 = hashlib.md5
+    sha1 = hashlib.sha1
+    sha256 = hashlib.sha256
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/logging.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/logging.py
new file mode 100644
index 0000000000000000000000000000000000000000..1e2f8ded83074b251a72c83edcf33205808250b9
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/logging.py
@@ -0,0 +1,185 @@
+# coding=utf-8
+# Copyright 2020 Optuna, Hugging Face
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Logging utilities."""
+
+import logging
+import os
+from logging import (
+    CRITICAL,  # NOQA
+    DEBUG,  # NOQA
+    ERROR,  # NOQA
+    FATAL,  # NOQA
+    INFO,  # NOQA
+    NOTSET,  # NOQA
+    WARN,  # NOQA
+    WARNING,  # NOQA
+)
+from typing import Optional
+
+from .. import constants
+
+
+log_levels = {
+    "debug": logging.DEBUG,
+    "info": logging.INFO,
+    "warning": logging.WARNING,
+    "error": logging.ERROR,
+    "critical": logging.CRITICAL,
+}
+
+_default_log_level = logging.WARNING
+
+
+def _get_library_name() -> str:
+    return __name__.split(".")[0]
+
+
+def _get_library_root_logger() -> logging.Logger:
+    return logging.getLogger(_get_library_name())
+
+
+def _get_default_logging_level():
+    """
+    If `HF_HUB_VERBOSITY` env var is set to one of the valid choices return that as the new default level. If it is not
+    - fall back to `_default_log_level`
+    """
+    env_level_str = os.getenv("HF_HUB_VERBOSITY", None)
+    if env_level_str:
+        if env_level_str in log_levels:
+            return log_levels[env_level_str]
+        else:
+            logging.getLogger().warning(
+                f"Unknown option HF_HUB_VERBOSITY={env_level_str}, has to be one of: {', '.join(log_levels.keys())}"
+            )
+    return _default_log_level
+
+
+def _configure_library_root_logger() -> None:
+    library_root_logger = _get_library_root_logger()
+    library_root_logger.addHandler(logging.StreamHandler())
+    library_root_logger.setLevel(_get_default_logging_level())
+
+
+def _reset_library_root_logger() -> None:
+    library_root_logger = _get_library_root_logger()
+    library_root_logger.setLevel(logging.NOTSET)
+
+
+def get_logger(name: Optional[str] = None) -> logging.Logger:
+    """
+        Returns a logger with the specified name. This function is not supposed
+        to be directly accessed by library users.
+
+        Args:
+            name (`str`, *optional*):
+                The name of the logger to get, usually the filename
+
+        Example:
+
+    ```python
+    >>> from huggingface_hub import get_logger
+
+    >>> logger = get_logger(__file__)
+    >>> logger.set_verbosity_info()
+    ```
+    """
+
+    if name is None:
+        name = _get_library_name()
+
+    return logging.getLogger(name)
+
+
+def get_verbosity() -> int:
+    """Return the current level for the HuggingFace Hub's root logger.
+
+    Returns:
+        Logging level, e.g., `huggingface_hub.logging.DEBUG` and
+        `huggingface_hub.logging.INFO`.
+
+    > [!TIP]
+    > HuggingFace Hub has following logging levels:
+    >
+    > - `huggingface_hub.logging.CRITICAL`, `huggingface_hub.logging.FATAL`
+    > - `huggingface_hub.logging.ERROR`
+    > - `huggingface_hub.logging.WARNING`, `huggingface_hub.logging.WARN`
+    > - `huggingface_hub.logging.INFO`
+    > - `huggingface_hub.logging.DEBUG`
+    """
+    return _get_library_root_logger().getEffectiveLevel()
+
+
+def set_verbosity(verbosity: int) -> None:
+    """
+    Sets the level for the HuggingFace Hub's root logger.
+
+    Args:
+        verbosity (`int`):
+            Logging level, e.g., `huggingface_hub.logging.DEBUG` and
+            `huggingface_hub.logging.INFO`.
+    """
+    _get_library_root_logger().setLevel(verbosity)
+
+
+def set_verbosity_info():
+    """
+    Sets the verbosity to `logging.INFO`.
+    """
+    return set_verbosity(INFO)
+
+
+def set_verbosity_warning():
+    """
+    Sets the verbosity to `logging.WARNING`.
+    """
+    return set_verbosity(WARNING)
+
+
+def set_verbosity_debug():
+    """
+    Sets the verbosity to `logging.DEBUG`.
+    """
+    return set_verbosity(DEBUG)
+
+
+def set_verbosity_error():
+    """
+    Sets the verbosity to `logging.ERROR`.
+    """
+    return set_verbosity(ERROR)
+
+
+def disable_propagation() -> None:
+    """
+    Disable propagation of the library log outputs. Note that log propagation is
+    disabled by default.
+    """
+    _get_library_root_logger().propagate = False
+
+
+def enable_propagation() -> None:
+    """
+    Enable propagation of the library log outputs. Please disable the
+    HuggingFace Hub's default handler to prevent double logging if the root
+    logger has been configured.
+    """
+    _get_library_root_logger().propagate = True
+
+
+_configure_library_root_logger()
+
+if constants.HF_DEBUG:
+    # If `HF_DEBUG` environment variable is set, set the verbosity of `huggingface_hub` logger to `DEBUG`.
+    set_verbosity_debug()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/sha.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/sha.py
new file mode 100644
index 0000000000000000000000000000000000000000..001c3fe8b2f37a64e890888ca3d521c10ec8f03b
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/sha.py
@@ -0,0 +1,64 @@
+"""Utilities to efficiently compute the SHA 256 hash of a bunch of bytes."""
+
+from typing import BinaryIO, Optional
+
+from .insecure_hashlib import sha1, sha256
+
+
+def sha_fileobj(fileobj: BinaryIO, chunk_size: Optional[int] = None) -> bytes:
+    """
+    Computes the sha256 hash of the given file object, by chunks of size `chunk_size`.
+
+    Args:
+        fileobj (file-like object):
+            The File object to compute sha256 for, typically obtained with `open(path, "rb")`
+        chunk_size (`int`, *optional*):
+            The number of bytes to read from `fileobj` at once, defaults to 1MB.
+
+    Returns:
+        `bytes`: `fileobj`'s sha256 hash as bytes
+    """
+    chunk_size = chunk_size if chunk_size is not None else 1024 * 1024
+
+    sha = sha256()
+    while True:
+        chunk = fileobj.read(chunk_size)
+        sha.update(chunk)
+        if not chunk:
+            break
+    return sha.digest()
+
+
+def git_hash(data: bytes) -> str:
+    """
+    Computes the git-sha1 hash of the given bytes, using the same algorithm as git.
+
+    This is equivalent to running `git hash-object`. See https://git-scm.com/docs/git-hash-object
+    for more details.
+
+    Note: this method is valid for regular files. For LFS files, the proper git hash is supposed to be computed on the
+          pointer file content, not the actual file content. However, for simplicity, we directly compare the sha256 of
+          the LFS file content when we want to compare LFS files.
+
+    Args:
+        data (`bytes`):
+            The data to compute the git-hash for.
+
+    Returns:
+        `str`: the git-hash of `data` as an hexadecimal string.
+
+    Example:
+    ```python
+    >>> from huggingface_hub.utils.sha import git_hash
+    >>> git_hash(b"Hello, World!")
+    'b45ef6fec89518d314f546fd6c3025367b721684'
+    ```
+    """
+    # Taken from https://gist.github.com/msabramo/763200
+    # Note: no need to optimize by reading the file in chunks as we're not supposed to hash huge files (5MB maximum).
+    sha = sha1()
+    sha.update(b"blob ")
+    sha.update(str(len(data)).encode())
+    sha.update(b"\0")
+    sha.update(data)
+    return sha.hexdigest()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/tqdm.py b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/tqdm.py
new file mode 100644
index 0000000000000000000000000000000000000000..4c1fcef4beb73bae13c57b3f66c5828e775b7cd9
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/huggingface_hub/utils/tqdm.py
@@ -0,0 +1,307 @@
+# coding=utf-8
+# Copyright 2021 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License
+"""Utility helpers to handle progress bars in `huggingface_hub`.
+
+Example:
+    1. Use `huggingface_hub.utils.tqdm` as you would use `tqdm.tqdm` or `tqdm.auto.tqdm`.
+    2. To disable progress bars, either use `disable_progress_bars()` helper or set the
+       environment variable `HF_HUB_DISABLE_PROGRESS_BARS` to 1.
+    3. To re-enable progress bars, use `enable_progress_bars()`.
+    4. To check whether progress bars are disabled, use `are_progress_bars_disabled()`.
+
+NOTE: Environment variable `HF_HUB_DISABLE_PROGRESS_BARS` has the priority.
+
+Example:
+    ```py
+    >>> from huggingface_hub.utils import are_progress_bars_disabled, disable_progress_bars, enable_progress_bars, tqdm
+
+    # Disable progress bars globally
+    >>> disable_progress_bars()
+
+    # Use as normal `tqdm`
+    >>> for _ in tqdm(range(5)):
+    ...    pass
+
+    # Still not showing progress bars, as `disable=False` is overwritten to `True`.
+    >>> for _ in tqdm(range(5), disable=False):
+    ...    pass
+
+    >>> are_progress_bars_disabled()
+    True
+
+    # Re-enable progress bars globally
+    >>> enable_progress_bars()
+
+    # Progress bar will be shown !
+    >>> for _ in tqdm(range(5)):
+    ...   pass
+    100%|███████████████████████████████████████| 5/5 [00:00<00:00, 117817.53it/s]
+    ```
+
+Group-based control:
+    ```python
+    # Disable progress bars for a specific group
+    >>> disable_progress_bars("peft.foo")
+
+    # Check state of different groups
+    >>> assert not are_progress_bars_disabled("peft"))
+    >>> assert not are_progress_bars_disabled("peft.something")
+    >>> assert are_progress_bars_disabled("peft.foo"))
+    >>> assert are_progress_bars_disabled("peft.foo.bar"))
+
+    # Enable progress bars for a subgroup
+    >>> enable_progress_bars("peft.foo.bar")
+
+    # Check if enabling a subgroup affects the parent group
+    >>> assert are_progress_bars_disabled("peft.foo"))
+    >>> assert not are_progress_bars_disabled("peft.foo.bar"))
+
+    # No progress bar for `name="peft.foo"`
+    >>> for _ in tqdm(range(5), name="peft.foo"):
+    ...     pass
+
+    # Progress bar will be shown for `name="peft.foo.bar"`
+    >>> for _ in tqdm(range(5), name="peft.foo.bar"):
+    ...     pass
+    100%|███████████████████████████████████████| 5/5 [00:00<00:00, 117817.53it/s]
+
+    ```
+"""
+
+import io
+import logging
+import os
+import warnings
+from contextlib import contextmanager, nullcontext
+from pathlib import Path
+from typing import ContextManager, Dict, Iterator, Optional, Union
+
+from tqdm.auto import tqdm as old_tqdm
+
+from ..constants import HF_HUB_DISABLE_PROGRESS_BARS
+
+
+# The `HF_HUB_DISABLE_PROGRESS_BARS` environment variable can be True, False, or not set (None),
+# allowing for control over progress bar visibility. When set, this variable takes precedence
+# over programmatic settings, dictating whether progress bars should be shown or hidden globally.
+# Essentially, the environment variable's setting overrides any code-based configurations.
+#
+# If `HF_HUB_DISABLE_PROGRESS_BARS` is not defined (None), it implies that users can manage
+# progress bar visibility through code. By default, progress bars are turned on.
+
+
+progress_bar_states: Dict[str, bool] = {}
+
+
+def disable_progress_bars(name: Optional[str] = None) -> None:
+    """
+    Disable progress bars either globally or for a specified group.
+
+    This function updates the state of progress bars based on a group name.
+    If no group name is provided, all progress bars are disabled. The operation
+    respects the `HF_HUB_DISABLE_PROGRESS_BARS` environment variable's setting.
+
+    Args:
+        name (`str`, *optional*):
+            The name of the group for which to disable the progress bars. If None,
+            progress bars are disabled globally.
+
+    Raises:
+        Warning: If the environment variable precludes changes.
+    """
+    if HF_HUB_DISABLE_PROGRESS_BARS is False:
+        warnings.warn(
+            "Cannot disable progress bars: environment variable `HF_HUB_DISABLE_PROGRESS_BARS=0` is set and has priority."
+        )
+        return
+
+    if name is None:
+        progress_bar_states.clear()
+        progress_bar_states["_global"] = False
+    else:
+        keys_to_remove = [key for key in progress_bar_states if key.startswith(f"{name}.")]
+        for key in keys_to_remove:
+            del progress_bar_states[key]
+        progress_bar_states[name] = False
+
+
+def enable_progress_bars(name: Optional[str] = None) -> None:
+    """
+    Enable progress bars either globally or for a specified group.
+
+    This function sets the progress bars to enabled for the specified group or globally
+    if no group is specified. The operation is subject to the `HF_HUB_DISABLE_PROGRESS_BARS`
+    environment setting.
+
+    Args:
+        name (`str`, *optional*):
+            The name of the group for which to enable the progress bars. If None,
+            progress bars are enabled globally.
+
+    Raises:
+        Warning: If the environment variable precludes changes.
+    """
+    if HF_HUB_DISABLE_PROGRESS_BARS is True:
+        warnings.warn(
+            "Cannot enable progress bars: environment variable `HF_HUB_DISABLE_PROGRESS_BARS=1` is set and has priority."
+        )
+        return
+
+    if name is None:
+        progress_bar_states.clear()
+        progress_bar_states["_global"] = True
+    else:
+        keys_to_remove = [key for key in progress_bar_states if key.startswith(f"{name}.")]
+        for key in keys_to_remove:
+            del progress_bar_states[key]
+        progress_bar_states[name] = True
+
+
+def are_progress_bars_disabled(name: Optional[str] = None) -> bool:
+    """
+    Check if progress bars are disabled globally or for a specific group.
+
+    This function returns whether progress bars are disabled for a given group or globally.
+    It checks the `HF_HUB_DISABLE_PROGRESS_BARS` environment variable first, then the programmatic
+    settings.
+
+    Args:
+        name (`str`, *optional*):
+            The group name to check; if None, checks the global setting.
+
+    Returns:
+        `bool`: True if progress bars are disabled, False otherwise.
+    """
+    if HF_HUB_DISABLE_PROGRESS_BARS is True:
+        return True
+
+    if name is None:
+        return not progress_bar_states.get("_global", True)
+
+    while name:
+        if name in progress_bar_states:
+            return not progress_bar_states[name]
+        name = ".".join(name.split(".")[:-1])
+
+    return not progress_bar_states.get("_global", True)
+
+
+def is_tqdm_disabled(log_level: int) -> Optional[bool]:
+    """
+    Determine if tqdm progress bars should be disabled based on logging level and environment settings.
+
+    see https://github.com/huggingface/huggingface_hub/pull/2000 and https://github.com/huggingface/huggingface_hub/pull/2698.
+    """
+    if log_level == logging.NOTSET:
+        return True
+    if os.getenv("TQDM_POSITION") == "-1":
+        return False
+    return None
+
+
+class tqdm(old_tqdm):
+    """
+    Class to override `disable` argument in case progress bars are globally disabled.
+
+    Taken from https://github.com/tqdm/tqdm/issues/619#issuecomment-619639324.
+    """
+
+    def __init__(self, *args, **kwargs):
+        name = kwargs.pop("name", None)  # do not pass `name` to `tqdm`
+        if are_progress_bars_disabled(name):
+            kwargs["disable"] = True
+        super().__init__(*args, **kwargs)
+
+    def __delattr__(self, attr: str) -> None:
+        """Fix for https://github.com/huggingface/huggingface_hub/issues/1603"""
+        try:
+            super().__delattr__(attr)
+        except AttributeError:
+            if attr != "_lock":
+                raise
+
+
+@contextmanager
+def tqdm_stream_file(path: Union[Path, str]) -> Iterator[io.BufferedReader]:
+    """
+    Open a file as binary and wrap the `read` method to display a progress bar when it's streamed.
+
+    First implemented in `transformers` in 2019 but removed when switched to git-lfs. Used in `huggingface_hub` to show
+    progress bar when uploading an LFS file to the Hub. See github.com/huggingface/transformers/pull/2078#discussion_r354739608
+    for implementation details.
+
+    Note: currently implementation handles only files stored on disk as it is the most common use case. Could be
+          extended to stream any `BinaryIO` object but we might have to debug some corner cases.
+
+    Example:
+    ```py
+    >>> with tqdm_stream_file("config.json") as f:
+    >>>     requests.put(url, data=f)
+    config.json: 100%|█████████████████████████| 8.19k/8.19k [00:02<00:00, 3.72kB/s]
+    ```
+    """
+    if isinstance(path, str):
+        path = Path(path)
+
+    with path.open("rb") as f:
+        total_size = path.stat().st_size
+        pbar = tqdm(
+            unit="B",
+            unit_scale=True,
+            total=total_size,
+            initial=0,
+            desc=path.name,
+        )
+
+        f_read = f.read
+
+        def _inner_read(size: Optional[int] = -1) -> bytes:
+            data = f_read(size)
+            pbar.update(len(data))
+            return data
+
+        f.read = _inner_read  # type: ignore
+
+        yield f
+
+        pbar.close()
+
+
+def _get_progress_bar_context(
+    *,
+    desc: str,
+    log_level: int,
+    total: Optional[int] = None,
+    initial: int = 0,
+    unit: str = "B",
+    unit_scale: bool = True,
+    name: Optional[str] = None,
+    _tqdm_bar: Optional[tqdm] = None,
+) -> ContextManager[tqdm]:
+    if _tqdm_bar is not None:
+        return nullcontext(_tqdm_bar)
+        # ^ `contextlib.nullcontext` mimics a context manager that does nothing
+        #   Makes it easier to use the same code path for both cases but in the later
+        #   case, the progress bar is not closed when exiting the context manager.
+
+    return tqdm(
+        unit=unit,
+        unit_scale=unit_scale,
+        total=total,
+        initial=initial,
+        desc=desc,
+        disable=is_tqdm_disabled(log_level=log_level),
+        name=name,
+    )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..405579b1dbe41ee6189ee80d8cce7aeff76623cd
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/codec.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/codec.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..06a403f1440ff0ae5e0bec1ef032bc36e7f78582
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/codec.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/compat.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/compat.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..0cfc9f4fa64ff6c2b9a4e4f644069d7aa3fdbc2a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/compat.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/core.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/core.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..24559280187afb15a520bea8235f5379a4f9b37f
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/core.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/idnadata.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/idnadata.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..cb3080cf4122e990b7e278df41e03fe4ffeda6ed
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/idnadata.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/intranges.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/intranges.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..bf273dea0ec7b2f10fab03bdf20fecffab857a64
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/intranges.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/package_data.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/package_data.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..cfade55982c6a0757ad8f3fef28a234271e267c3
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/idna/__pycache__/package_data.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..8d573524e76021dbc6058fd75c38958c4cba2b99
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_definitions.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_definitions.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3fa69c96e746e6aab2c1357abbd3e51398dedb1d
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_definitions.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_io.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_io.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f549739027068400ed4135a6a7f0d360003a2461
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_io.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_parsing.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_parsing.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..634e6f4ce241e2b17270ce764a8553a568fb1c05
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_parsing.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..0eb2061a9d959f21ab73b39c269d9e715f0eeff1
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/__pycache__/_utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/binaries/README.md b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/binaries/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4f941f06087440f2d302916f7c53b394876d2dc1
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/binaries/README.md
@@ -0,0 +1 @@
+Exes are dropped here by the release script.
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/binaries/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/binaries/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..5919de3bbf4cdb56705f83e3cfa63e3c332925dd
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/binaries/__init__.py
@@ -0,0 +1 @@
+# Just here to make importlib.resources work
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/binaries/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/binaries/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c521d06af65d5a952be6db00ea3a2d3ee2c0d52b
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/imageio_ffmpeg/binaries/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/nvidia_cudnn_cu12-9.10.2.21.dist-info/licenses/License.txt b/URSA/.venv_ursa/lib/python3.12/site-packages/nvidia_cudnn_cu12-9.10.2.21.dist-info/licenses/License.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f0d485c1c82d2c86b62ac0deeb8568fcdb58e0bb
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/nvidia_cudnn_cu12-9.10.2.21.dist-info/licenses/License.txt
@@ -0,0 +1,154 @@
+LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS
+
+This license agreement, including exhibits attached ("Agreement”) is a legal agreement between you and NVIDIA Corporation ("NVIDIA") and governs your use of a NVIDIA software development kit (“SDK”). 
+
+Each SDK has its own set of software and materials, but here is a description of the types of items that may be included in a SDK: source code, header files, APIs, data sets and assets (examples include images, textures, models, scenes, videos, native API input/output files), binary software, sample code, libraries, utility programs, programming code and documentation. 
+
+This Agreement can be accepted only by an adult of legal age of majority in the country in which the SDK is used. 
+
+If you are entering into this Agreement on behalf of a company or other legal entity, you represent that you have the legal authority to bind the entity to this Agreement, in which case “you” will mean the entity you represent. 
+
+If you don’t have the required age or authority to accept this Agreement, or if you don’t accept all the terms and conditions of this Agreement, do not download, install or use the SDK.  
+
+You agree to use the SDK only for purposes that are permitted by (a) this Agreement, and (b) any applicable law, regulation or generally accepted practices or guidelines in the relevant jurisdictions.
+
+1. License. 
+
+1.1 Grant
+
+Subject to the terms of this Agreement, NVIDIA hereby grants you a non-exclusive, non-transferable license, without the right to sublicense (except as expressly provided in this Agreement) to: 
+
+(i) Install and use the SDK,
+
+(ii) Modify and create derivative works of sample source code delivered in the SDK, and
+ 
+(iii) Distribute those portions of the SDK that are identified in this Agreement as distributable, as incorporated in object code format into a software application that meets the distribution requirements indicated in this Agreement.
+
+1.2 Distribution Requirements
+
+These are the distribution requirements for you to exercise the distribution grant:
+       
+(i) Your application must have material additional functionality, beyond the included portions of the SDK.
+
+(ii) The distributable portions of the SDK shall only be accessed by your application.  
+
+(iii)  The following notice shall be included in modifications and derivative works of sample source code distributed: “This software contains source code provided by NVIDIA Corporation.”
+
+(iv)  Unless a developer tool is identified in this Agreement as distributable, it is delivered for your internal use only.
+
+(v) The terms under which you distribute your application must be consistent with the terms of this Agreement, including (without limitation) terms relating to the license grant and license restrictions and protection of NVIDIA’s intellectual property rights. Additionally, you agree that you will protect the privacy, security and legal rights of your application users. 
+
+(vi) You agree to notify NVIDIA in writing of any known or suspected distribution or use of the SDK not in compliance with the requirements of this Agreement, and to enforce the terms of your agreements with respect to distributed SDK.
+
+1.3 Authorized Users
+
+You may allow employees and contractors of your entity or of your subsidiary(ies) to access and use the SDK from your secure network to perform work on your behalf. 
+
+If you are an academic institution you may allow users enrolled or employed by the academic institution to access and use the SDK from your secure network. 
+
+You are responsible for the compliance with the terms of this Agreement by your authorized users. If you become aware that your authorized users didn’t follow the terms of this Agreement, you agree to take reasonable steps to resolve the non-compliance and prevent new occurrences. 
+
+1.4 Pre-Release SDK 
+The SDK versions identified as alpha, beta, preview or otherwise as pre-release, may not be fully functional, may contain errors or design flaws, and may have reduced or different security, privacy, accessibility, availability, and reliability standards relative to commercial versions of NVIDIA software and materials. Use of a pre-release SDK may result in unexpected results, loss of data, project delays or other unpredictable damage or loss. 
+You may use a pre-release SDK at your own risk, understanding that pre-release SDKs are not intended for use in production or business-critical systems. 
+NVIDIA may choose not to make available a commercial version of any pre-release SDK. NVIDIA may also choose to abandon development and terminate the availability of a pre-release SDK at any time without liability. 
+1.5	 Updates
+
+NVIDIA may, at its option, make available patches, workarounds or other updates to this SDK. Unless the updates are provided with their separate governing terms, they are deemed part of the SDK licensed to you as provided in this Agreement.
+
+You agree that the form and content of the SDK that NVIDIA provides may change without prior notice to you. While NVIDIA generally maintains compatibility between versions, NVIDIA may in some cases make changes that introduce incompatibilities in future versions of the SDK.
+
+1.6	 Third Party Licenses
+
+The SDK may come bundled with, or otherwise include or be distributed with, third party software licensed by a NVIDIA supplier and/or open source software provided under an open source license. Use of third party software is subject to the third-party license terms, or in the absence of third party terms, the terms of this Agreement. Copyright to third party software is held by the copyright holders indicated in the third-party software or license.
+
+1.7 Reservation of Rights
+
+NVIDIA reserves all rights, title and interest in and to the SDK not expressly granted to you under this Agreement.
+
+2. Limitations. 
+
+The following license limitations apply to your use of the SDK:
+
+2.1 You may not reverse engineer, decompile or disassemble, or remove copyright or other proprietary notices from any portion of the SDK or copies of the SDK. 
+
+2.2 Except as expressly provided in this Agreement, you may not copy, sell, rent, sublicense, transfer, distribute, modify, or create derivative works of any portion of the SDK. 
+
+2.3 Unless you have an agreement with NVIDIA for this purpose, you may not indicate that an application created with the SDK is sponsored or endorsed by NVIDIA. 
+
+2.4 You may not bypass, disable, or circumvent any encryption, security, digital rights management or authentication mechanism in the SDK. 
+
+2.5 You may not use the SDK in any manner that would cause it to become subject to an open source software license. As examples, licenses that require as a condition of use, modification, and/or distribution that the SDK be (i) disclosed or distributed in source code form; (ii) licensed for the purpose of making derivative works; or (iii) redistributable at no charge.
+
+2.6  Unless you have an agreement with NVIDIA for this purpose, you may not use the SDK with any system or application where the use or failure of the system or application can reasonably be expected to threaten or result in personal injury, death, or catastrophic loss. Examples include use in avionics, navigation, military, medical, life support or other life critical applications. NVIDIA does not design, test or manufacture the SDK for these critical uses and NVIDIA shall not be liable to you or any third party, in whole or in part, for any claims or damages arising from such uses. 
+
+2.7 You agree to defend, indemnify and hold harmless NVIDIA and its affiliates, and their respective employees, contractors, agents, officers and directors, from and against any and all claims, damages, obligations, losses, liabilities, costs or debt, fines, restitutions and expenses (including but not limited to attorney’s fees and costs incident to establishing the right of indemnification) arising out of or related to your use of the SDK outside of the scope of this Agreement, or not in compliance with its terms.
+
+3. Ownership. 
+
+3.1 NVIDIA or its licensors hold all rights, title and interest in and to the SDK and its modifications and derivative works, including their respective intellectual property rights, subject to your rights under Section 3.2. This SDK may include software and materials from NVIDIA’s licensors, and these licensors are intended third party beneficiaries that may enforce this Agreement with respect to their intellectual property rights. 
+
+3.2 You hold all rights, title and interest in and to your applications and your derivative works of the sample source code delivered in the SDK, including their respective intellectual property rights, subject to NVIDIA’s rights under section 3.1.
+
+3.3 You may, but don’t have to, provide to NVIDIA suggestions, feature requests or other feedback regarding the SDK, including possible enhancements or modifications to the SDK. For any feedback that you voluntarily provide, you hereby grant NVIDIA and its affiliates a perpetual, non-exclusive, worldwide, irrevocable license to use, reproduce, modify, license, sublicense (through multiple tiers of sublicensees), and distribute (through multiple tiers of distributors) it without the payment of any royalties or fees to you. NVIDIA will use feedback at its choice. NVIDIA is constantly looking for ways to improve its products, so you may send feedback to NVIDIA through the developer portal at https://developer.nvidia.com.
+
+4.  No Warranties. 
+
+THE SDK IS PROVIDED BY NVIDIA “AS IS” AND “WITH ALL FAULTS.” TO THE MAXIMUM EXTENT PERMITTED BY LAW, NVIDIA AND ITS AFFILIATES EXPRESSLY DISCLAIM ALL WARRANTIES OF ANY KIND OR NATURE, WHETHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, NON-INFRINGEMENT, OR THE ABSENCE OF ANY DEFECTS THEREIN, WHETHER LATENT OR PATENT. NO WARRANTY IS MADE ON THE BASIS OF TRADE USAGE, COURSE OF DEALING OR COURSE OF TRADE. 
+
+5.	Limitations of Liability. 
+
+TO THE MAXIMUM EXTENT PERMITTED BY LAW, NVIDIA AND ITS AFFILIATES SHALL NOT BE LIABLE FOR ANY SPECIAL, INCIDENTAL, PUNITIVE OR CONSEQUENTIAL DAMAGES, OR ANY LOST PROFITS, LOSS OF USE, LOSS OF DATA OR LOSS OF GOODWILL, OR THE COSTS OF PROCURING SUBSTITUTE PRODUCTS, ARISING OUT OF OR IN CONNECTION WITH THIS AGREEMENT OR THE USE OR PERFORMANCE OF THE SDK, WHETHER SUCH LIABILITY ARISES FROM ANY CLAIM BASED UPON BREACH OF CONTRACT, BREACH OF WARRANTY, TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY OR ANY OTHER CAUSE OF ACTION OR THEORY OF LIABILITY. IN NO EVENT WILL NVIDIA’S AND ITS AFFILIATES TOTAL CUMULATIVE LIABILITY UNDER OR ARISING OUT OF THIS AGREEMENT EXCEED US$10.00. THE NATURE OF THE LIABILITY OR THE NUMBER OF CLAIMS OR SUITS SHALL NOT ENLARGE OR EXTEND THIS LIMIT. 
+
+These exclusions and limitations of liability shall apply regardless if NVIDIA or its affiliates have been advised of the possibility of such damages, and regardless of whether a remedy fails its essential purpose. These exclusions and limitations of liability form an essential basis of the bargain between the parties, and, absent any of these exclusions or limitations of liability, the provisions of this Agreement, including, without limitation, the economic terms, would be substantially different. 
+
+6.   Termination. 
+
+6.1 This Agreement will continue to apply until terminated by either you or NVIDIA as described below. 
+
+6.2 If you want to terminate this Agreement, you may do so by stopping to use the SDK. 
+
+6.3 NVIDIA may, at any time, terminate this Agreement if: (i) you fail to comply with any term of this Agreement and the non-compliance is not fixed within thirty (30) days following notice from NVIDIA (or immediately if you violate NVIDIA’s intellectual property rights); (ii) you commence or participate in any legal proceeding against NVIDIA with respect to the SDK; or (iii) NVIDIA decides to no longer provide the SDK in a country or, in NVIDIA’s sole discretion, the continued use of it is no longer commercially viable. 
+
+6.4 Upon any termination of this Agreement, you agree to promptly discontinue use of the SDK and destroy all copies in your possession or control. Your prior distributions in accordance with this Agreement are not affected by the termination of this Agreement. Upon written request, you will certify in writing that you have complied with your commitments under this section. Upon any termination of this Agreement all provisions survive except for the licenses granted to you. 
+
+7.  General.  
+ 
+If you wish to assign this Agreement or your rights and obligations, including by merger, consolidation, dissolution or operation of law, contact NVIDIA to ask for permission. Any attempted assignment not approved by NVIDIA in writing shall be void and of no effect. NVIDIA may assign, delegate or transfer this Agreement and its rights and obligations, and if to a non-affiliate you will be notified. 
+
+You agree to cooperate with NVIDIA and provide reasonably requested information to verify your compliance with this Agreement.
+
+This Agreement will be governed in all respects by the laws of the United States and of the State of Delaware as those laws are applied to contracts entered into and performed entirely within Delaware by Delaware residents, without regard to the conflicts of laws principles. The United Nations Convention on Contracts for the International Sale of Goods is specifically disclaimed. You agree to all terms of this Agreement in the English language.
+
+The state or federal courts residing in Santa Clara County, California shall have exclusive jurisdiction over any dispute or claim arising out of this Agreement. Notwithstanding this, you agree that NVIDIA shall still be allowed to apply for injunctive remedies or an equivalent type of urgent legal relief in any jurisdiction. 
+
+If any court of competent jurisdiction determines that any provision of this Agreement is illegal, invalid or unenforceable, such provision will be construed as limited to the extent necessary to be consistent with and fully enforceable under the law and the remaining provisions will remain in full force and effect. Unless otherwise specified, remedies are cumulative.
+
+Each party acknowledges and agrees that the other is an independent contractor in the performance of this Agreement. 
+
+The SDK has been developed entirely at private expense and is “commercial items” consisting of “commercial computer software” and “commercial computer software documentation” provided with RESTRICTED RIGHTS. Use, duplication or disclosure by the U.S. Government or a U.S. Government subcontractor is subject to the restrictions in this Agreement pursuant to DFARS 227.7202-3(a) or as set forth in subparagraphs (b)(1) and (2) of the Commercial Computer Software - Restricted Rights clause at FAR 52.227-19, as applicable. Contractor/manufacturer is NVIDIA, 2788 San Tomas Expressway, Santa Clara, CA 95051.
+
+The SDK is subject to United States export laws and regulations. You agree that you will not ship, transfer or export the SDK into any country, or use the SDK in any manner, prohibited by the United States Bureau of Industry and Security or economic sanctions regulations administered by the U.S. Department of Treasury’s Office of Foreign Assets Control (OFAC), or any applicable export laws, restrictions or regulations. These laws include restrictions on destinations, end users and end use. By accepting this Agreement, you confirm that you are not a resident or citizen of any country currently embargoed by the U.S. and that you are not otherwise prohibited from receiving the SDK.
+
+Any notice delivered by NVIDIA to you under this Agreement will be delivered via mail, email or fax. You agree that any notices that NVIDIA sends you electronically will satisfy any legal communication requirements. Please direct your legal notices or other correspondence to NVIDIA Corporation, 2788 San Tomas Expressway, Santa Clara, California 95051, United States of America, Attention: Legal Department.
+
+This Agreement and any exhibits incorporated into this Agreement constitute the entire agreement of the parties with respect to the subject matter of this Agreement and supersede all prior negotiations or documentation exchanged between the parties relating to this SDK license. Any additional and/or conflicting terms on documents issued by you are null, void, and invalid. Any amendment or waiver under this Agreement shall be in writing and signed by representatives of both parties.
+
+(v. January 28, 2020)
+
+
+cuDNN SUPPLEMENT TO SOFTWARE LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS
+
+The terms in this supplement govern your use of the NVIDIA cuDNN SDK under the terms of your license agreement (“Agreement”) as modified by this supplement. Capitalized terms used but not defined below have the meaning assigned to them in the Agreement.
+
+This supplement is an exhibit to the Agreement and is incorporated as an integral part of the Agreement. In the event of conflict between the terms in this supplement and the terms in the Agreement, the terms in this supplement govern.  
+
+4.1 License Scope. The SDK is licensed for you to develop applications only for use in systems with NVIDIA GPUs.
+
+2. Distribution. The following portions of the SDK are distributable under the Agreement: the runtime files .so and .h, cudnn64_7.dll, and cudnn.lib. 
+
+In addition to the rights above, for parties that are developing software intended solely for use on Jetson development kits or Jetson modules and running Linux for Tegra software the following shall apply: the SDK may be distributed in its entirety, as provided by NVIDIA and without separation of its components, for you and/or your licensees to create software development kits for use only on the Jetson platform and running Linux for Tegra software.
+
+3. Licensing. If the distribution terms in this Agreement are not suitable for your organization, or for any questions regarding this Agreement, please contact NVIDIA at nvidia-compute-license-questions@nvidia.com.
+ (v. January 28, 2020)
+
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/nvidia_nvshmem_cu12-3.4.5.dist-info/licenses/License.txt b/URSA/.venv_ursa/lib/python3.12/site-packages/nvidia_nvshmem_cu12-3.4.5.dist-info/licenses/License.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b491c70e0aef319022ded661e111ddbd45b8a17f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/nvidia_nvshmem_cu12-3.4.5.dist-info/licenses/License.txt
@@ -0,0 +1,1568 @@
+End User License Agreement
+--------------------------
+
+
+Preface
+-------
+
+The Software License Agreement in Chapter 1 and the Supplement
+in Chapter 2 contain license terms and conditions that govern
+the use of NVIDIA software. By accepting this agreement, you
+agree to comply with all the terms and conditions applicable
+to the product(s) included herein.
+
+
+NVIDIA Driver
+
+
+Description
+
+This package contains the operating system driver and
+fundamental system software components for NVIDIA GPUs.
+
+
+NVIDIA CUDA Toolkit
+
+
+Description
+
+The NVIDIA CUDA Toolkit provides command-line and graphical
+tools for building, debugging and optimizing the performance
+of applications accelerated by NVIDIA GPUs, runtime and math
+libraries, and documentation including programming guides,
+user manuals, and API references.
+
+
+Default Install Location of CUDA Toolkit
+
+Windows platform:
+
+%ProgramFiles%\NVIDIA GPU Computing Toolkit\CUDA\v#.#
+
+Linux platform:
+
+/usr/local/cuda-#.#
+
+Mac platform:
+
+/Developer/NVIDIA/CUDA-#.#
+
+
+NVIDIA CUDA Samples
+
+
+Description
+
+This package includes over 100+ CUDA examples that demonstrate
+various CUDA programming principles, and efficient CUDA
+implementation of algorithms in specific application domains.
+
+
+Default Install Location of CUDA Samples
+
+Windows platform:
+
+%ProgramData%\NVIDIA Corporation\CUDA Samples\v#.#
+
+Linux platform:
+
+/usr/local/cuda-#.#/samples
+
+and
+
+$HOME/NVIDIA_CUDA-#.#_Samples
+
+Mac platform:
+
+/Developer/NVIDIA/CUDA-#.#/samples
+
+
+NVIDIA Nsight Visual Studio Edition (Windows only)
+
+
+Description
+
+NVIDIA Nsight Development Platform, Visual Studio Edition is a
+development environment integrated into Microsoft Visual
+Studio that provides tools for debugging, profiling, analyzing
+and optimizing your GPU computing and graphics applications.
+
+
+Default Install Location of Nsight Visual Studio Edition
+
+Windows platform:
+
+%ProgramFiles(x86)%\NVIDIA Corporation\Nsight Visual Studio Edition #.#
+
+
+1. License Agreement for NVIDIA Software Development Kits
+---------------------------------------------------------
+
+
+Release Date: July 26, 2018
+---------------------------
+
+
+Important NoticeRead before downloading, installing,
+copying or using the licensed software:
+-------------------------------------------------------
+
+This license agreement, including exhibits attached
+("Agreement”) is a legal agreement between you and NVIDIA
+Corporation ("NVIDIA") and governs your use of a NVIDIA
+software development kit (“SDK”).
+
+Each SDK has its own set of software and materials, but here
+is a description of the types of items that may be included in
+a SDK: source code, header files, APIs, data sets and assets
+(examples include images, textures, models, scenes, videos,
+native API input/output files), binary software, sample code,
+libraries, utility programs, programming code and
+documentation.
+
+This Agreement can be accepted only by an adult of legal age
+of majority in the country in which the SDK is used.
+
+If you are entering into this Agreement on behalf of a company
+or other legal entity, you represent that you have the legal
+authority to bind the entity to this Agreement, in which case
+“you” will mean the entity you represent.
+
+If you don’t have the required age or authority to accept
+this Agreement, or if you don’t accept all the terms and
+conditions of this Agreement, do not download, install or use
+the SDK.
+
+You agree to use the SDK only for purposes that are permitted
+by (a) this Agreement, and (b) any applicable law, regulation
+or generally accepted practices or guidelines in the relevant
+jurisdictions.
+
+
+1.1. License
+
+
+1.1.1. License Grant
+
+Subject to the terms of this Agreement, NVIDIA hereby grants
+you a non-exclusive, non-transferable license, without the
+right to sublicense (except as expressly provided in this
+Agreement) to:
+
+  1. Install and use the SDK,
+
+  2. Modify and create derivative works of sample source code
+    delivered in the SDK, and
+
+  3. Distribute those portions of the SDK that are identified
+    in this Agreement as distributable, as incorporated in
+    object code format into a software application that meets
+    the distribution requirements indicated in this Agreement.
+
+
+1.1.2. Distribution Requirements
+
+These are the distribution requirements for you to exercise
+the distribution grant:
+
+  1. Your application must have material additional
+    functionality, beyond the included portions of the SDK.
+
+  2. The distributable portions of the SDK shall only be
+    accessed by your application.
+
+  3. The following notice shall be included in modifications
+    and derivative works of sample source code distributed:
+    “This software contains source code provided by NVIDIA
+    Corporation.”
+
+  4. Unless a developer tool is identified in this Agreement
+    as distributable, it is delivered for your internal use
+    only.
+
+  5. The terms under which you distribute your application
+    must be consistent with the terms of this Agreement,
+    including (without limitation) terms relating to the
+    license grant and license restrictions and protection of
+    NVIDIA’s intellectual property rights. Additionally, you
+    agree that you will protect the privacy, security and
+    legal rights of your application users.
+
+  6. You agree to notify NVIDIA in writing of any known or
+    suspected distribution or use of the SDK not in compliance
+    with the requirements of this Agreement, and to enforce
+    the terms of your agreements with respect to distributed
+    SDK.
+
+
+1.1.3. Authorized Users
+
+You may allow employees and contractors of your entity or of
+your subsidiary(ies) to access and use the SDK from your
+secure network to perform work on your behalf.
+
+If you are an academic institution you may allow users
+enrolled or employed by the academic institution to access and
+use the SDK from your secure network.
+
+You are responsible for the compliance with the terms of this
+Agreement by your authorized users. If you become aware that
+your authorized users didn’t follow the terms of this
+Agreement, you agree to take reasonable steps to resolve the
+non-compliance and prevent new occurrences.
+
+
+1.1.4. Pre-Release SDK
+
+The SDK versions identified as alpha, beta, preview or
+otherwise as pre-release, may not be fully functional, may
+contain errors or design flaws, and may have reduced or
+different security, privacy, accessibility, availability, and
+reliability standards relative to commercial versions of
+NVIDIA software and materials. Use of a pre-release SDK may
+result in unexpected results, loss of data, project delays or
+other unpredictable damage or loss.
+
+You may use a pre-release SDK at your own risk, understanding
+that pre-release SDKs are not intended for use in production
+or business-critical systems.
+
+NVIDIA may choose not to make available a commercial version
+of any pre-release SDK. NVIDIA may also choose to abandon
+development and terminate the availability of a pre-release
+SDK at any time without liability.
+
+
+1.1.5. Updates
+
+NVIDIA may, at its option, make available patches, workarounds
+or other updates to this SDK. Unless the updates are provided
+with their separate governing terms, they are deemed part of
+the SDK licensed to you as provided in this Agreement. You
+agree that the form and content of the SDK that NVIDIA
+provides may change without prior notice to you. While NVIDIA
+generally maintains compatibility between versions, NVIDIA may
+in some cases make changes that introduce incompatibilities in
+future versions of the SDK.
+
+
+1.1.6. Third Party Licenses
+
+The SDK may come bundled with, or otherwise include or be
+distributed with, third party software licensed by a NVIDIA
+supplier and/or open source software provided under an open
+source license. Use of third party software is subject to the
+third-party license terms, or in the absence of third party
+terms, the terms of this Agreement. Copyright to third party
+software is held by the copyright holders indicated in the
+third-party software or license.
+
+
+1.1.7. Reservation of Rights
+
+NVIDIA reserves all rights, title, and interest in and to the
+SDK, not expressly granted to you under this Agreement.
+
+
+1.2. Limitations
+
+The following license limitations apply to your use of the
+SDK:
+
+  1. You may not reverse engineer, decompile or disassemble,
+    or remove copyright or other proprietary notices from any
+    portion of the SDK or copies of the SDK.
+
+  2. Except as expressly provided in this Agreement, you may
+    not copy, sell, rent, sublicense, transfer, distribute,
+    modify, or create derivative works of any portion of the
+    SDK. For clarity, you may not distribute or sublicense the
+    SDK as a stand-alone product.
+
+  3. Unless you have an agreement with NVIDIA for this
+    purpose, you may not indicate that an application created
+    with the SDK is sponsored or endorsed by NVIDIA.
+
+  4. You may not bypass, disable, or circumvent any
+    encryption, security, digital rights management or
+    authentication mechanism in the SDK.
+
+  5. You may not use the SDK in any manner that would cause it
+    to become subject to an open source software license. As
+    examples, licenses that require as a condition of use,
+    modification, and/or distribution that the SDK be:
+
+      a. Disclosed or distributed in source code form;
+
+      b. Licensed for the purpose of making derivative works;
+        or
+
+      c. Redistributable at no charge.
+
+  6. Unless you have an agreement with NVIDIA for this
+    purpose, you may not use the SDK with any system or
+    application where the use or failure of the system or
+    application can reasonably be expected to threaten or
+    result in personal injury, death, or catastrophic loss.
+    Examples include use in avionics, navigation, military,
+    medical, life support or other life critical applications.
+    NVIDIA does not design, test or manufacture the SDK for
+    these critical uses and NVIDIA shall not be liable to you
+    or any third party, in whole or in part, for any claims or
+    damages arising from such uses.
+
+  7. You agree to defend, indemnify and hold harmless NVIDIA
+    and its affiliates, and their respective employees,
+    contractors, agents, officers and directors, from and
+    against any and all claims, damages, obligations, losses,
+    liabilities, costs or debt, fines, restitutions and
+    expenses (including but not limited to attorney’s fees
+    and costs incident to establishing the right of
+    indemnification) arising out of or related to your use of
+    the SDK outside of the scope of this Agreement, or not in
+    compliance with its terms.
+
+
+1.3. Ownership
+
+  1.  NVIDIA or its licensors hold all rights, title and
+    interest in and to the SDK and its modifications and
+    derivative works, including their respective intellectual
+    property rights, subject to your rights described in this
+    section. This SDK may include software and materials from
+    NVIDIA’s licensors, and these licensors are intended
+    third party beneficiaries that may enforce this Agreement
+    with respect to their intellectual property rights.
+
+  2.  You hold all rights, title and interest in and to your
+    applications and your derivative works of the sample
+    source code delivered in the SDK, including their
+    respective intellectual property rights, subject to
+    NVIDIA’s rights described in this section.
+
+  3. You may, but don’t have to, provide to NVIDIA
+    suggestions, feature requests or other feedback regarding
+    the SDK, including possible enhancements or modifications
+    to the SDK. For any feedback that you voluntarily provide,
+    you hereby grant NVIDIA and its affiliates a perpetual,
+    non-exclusive, worldwide, irrevocable license to use,
+    reproduce, modify, license, sublicense (through multiple
+    tiers of sublicensees), and distribute (through multiple
+    tiers of distributors) it without the payment of any
+    royalties or fees to you. NVIDIA will use feedback at its
+    choice. NVIDIA is constantly looking for ways to improve
+    its products, so you may send feedback to NVIDIA through
+    the developer portal at https://developer.nvidia.com.
+
+
+1.4. No Warranties
+
+THE SDK IS PROVIDED BY NVIDIA “AS IS” AND “WITH ALL
+FAULTS.” TO THE MAXIMUM EXTENT PERMITTED BY LAW, NVIDIA AND
+ITS AFFILIATES EXPRESSLY DISCLAIM ALL WARRANTIES OF ANY KIND
+OR NATURE, WHETHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING,
+BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE, TITLE, NON-INFRINGEMENT, OR THE
+ABSENCE OF ANY DEFECTS THEREIN, WHETHER LATENT OR PATENT. NO
+WARRANTY IS MADE ON THE BASIS OF TRADE USAGE, COURSE OF
+DEALING OR COURSE OF TRADE.
+
+
+1.5. Limitation of Liability
+
+TO THE MAXIMUM EXTENT PERMITTED BY LAW, NVIDIA AND ITS
+AFFILIATES SHALL NOT BE LIABLE FOR ANY SPECIAL, INCIDENTAL,
+PUNITIVE OR CONSEQUENTIAL DAMAGES, OR ANY LOST PROFITS, LOSS
+OF USE, LOSS OF DATA OR LOSS OF GOODWILL, OR THE COSTS OF
+PROCURING SUBSTITUTE PRODUCTS, ARISING OUT OF OR IN CONNECTION
+WITH THIS AGREEMENT OR THE USE OR PERFORMANCE OF THE SDK,
+WHETHER SUCH LIABILITY ARISES FROM ANY CLAIM BASED UPON BREACH
+OF CONTRACT, BREACH OF WARRANTY, TORT (INCLUDING NEGLIGENCE),
+PRODUCT LIABILITY OR ANY OTHER CAUSE OF ACTION OR THEORY OF
+LIABILITY. IN NO EVENT WILL NVIDIA’S AND ITS AFFILIATES
+TOTAL CUMULATIVE LIABILITY UNDER OR ARISING OUT OF THIS
+AGREEMENT EXCEED US$10.00. THE NATURE OF THE LIABILITY OR THE
+NUMBER OF CLAIMS OR SUITS SHALL NOT ENLARGE OR EXTEND THIS
+LIMIT.
+
+These exclusions and limitations of liability shall apply
+regardless if NVIDIA or its affiliates have been advised of
+the possibility of such damages, and regardless of whether a
+remedy fails its essential purpose. These exclusions and
+limitations of liability form an essential basis of the
+bargain between the parties, and, absent any of these
+exclusions or limitations of liability, the provisions of this
+Agreement, including, without limitation, the economic terms,
+would be substantially different.
+
+
+1.6. Termination
+
+  1. This Agreement will continue to apply until terminated by
+    either you or NVIDIA as described below.
+
+  2. If you want to terminate this Agreement, you may do so by
+    stopping to use the SDK.
+
+  3. NVIDIA may, at any time, terminate this Agreement if:
+
+      a. (i) you fail to comply with any term of this
+        Agreement and the non-compliance is not fixed within
+        thirty (30) days following notice from NVIDIA (or
+        immediately if you violate NVIDIA’s intellectual
+        property rights);
+
+      b. (ii) you commence or participate in any legal
+        proceeding against NVIDIA with respect to the SDK; or
+
+      c. (iii) NVIDIA decides to no longer provide the SDK in
+        a country or, in NVIDIA’s sole discretion, the
+        continued use of it is no longer commercially viable.
+
+  4. Upon any termination of this Agreement, you agree to
+    promptly discontinue use of the SDK and destroy all copies
+    in your possession or control. Your prior distributions in
+    accordance with this Agreement are not affected by the
+    termination of this Agreement. Upon written request, you
+    will certify in writing that you have complied with your
+    commitments under this section. Upon any termination of
+    this Agreement all provisions survive except for the
+    license grant provisions.
+
+
+1.7. General
+
+If you wish to assign this Agreement or your rights and
+obligations, including by merger, consolidation, dissolution
+or operation of law, contact NVIDIA to ask for permission. Any
+attempted assignment not approved by NVIDIA in writing shall
+be void and of no effect. NVIDIA may assign, delegate or
+transfer this Agreement and its rights and obligations, and if
+to a non-affiliate you will be notified.
+
+You agree to cooperate with NVIDIA and provide reasonably
+requested information to verify your compliance with this
+Agreement.
+
+This Agreement will be governed in all respects by the laws of
+the United States and of the State of Delaware as those laws
+are applied to contracts entered into and performed entirely
+within Delaware by Delaware residents, without regard to the
+conflicts of laws principles. The United Nations Convention on
+Contracts for the International Sale of Goods is specifically
+disclaimed. You agree to all terms of this Agreement in the
+English language.
+
+The state or federal courts residing in Santa Clara County,
+California shall have exclusive jurisdiction over any dispute
+or claim arising out of this Agreement. Notwithstanding this,
+you agree that NVIDIA shall still be allowed to apply for
+injunctive remedies or an equivalent type of urgent legal
+relief in any jurisdiction.
+
+If any court of competent jurisdiction determines that any
+provision of this Agreement is illegal, invalid or
+unenforceable, such provision will be construed as limited to
+the extent necessary to be consistent with and fully
+enforceable under the law and the remaining provisions will
+remain in full force and effect. Unless otherwise specified,
+remedies are cumulative.
+
+Each party acknowledges and agrees that the other is an
+independent contractor in the performance of this Agreement.
+
+The SDK has been developed entirely at private expense and is
+“commercial items” consisting of “commercial computer
+software” and “commercial computer software
+documentation” provided with RESTRICTED RIGHTS. Use,
+duplication or disclosure by the U.S. Government or a U.S.
+Government subcontractor is subject to the restrictions in
+this Agreement pursuant to DFARS 227.7202-3(a) or as set forth
+in subparagraphs (c)(1) and (2) of the Commercial Computer
+Software - Restricted Rights clause at FAR 52.227-19, as
+applicable. Contractor/manufacturer is NVIDIA, 2788 San Tomas
+Expressway, Santa Clara, CA 95051.
+
+The SDK is subject to United States export laws and
+regulations. You agree that you will not ship, transfer or
+export the SDK into any country, or use the SDK in any manner,
+prohibited by the United States Bureau of Industry and
+Security or economic sanctions regulations administered by the
+U.S. Department of Treasury’s Office of Foreign Assets
+Control (OFAC), or any applicable export laws, restrictions or
+regulations. These laws include restrictions on destinations,
+end users and end use. By accepting this Agreement, you
+confirm that you are not a resident or citizen of any country
+currently embargoed by the U.S. and that you are not otherwise
+prohibited from receiving the SDK.
+
+Any notice delivered by NVIDIA to you under this Agreement
+will be delivered via mail, email or fax. You agree that any
+notices that NVIDIA sends you electronically will satisfy any
+legal communication requirements. Please direct your legal
+notices or other correspondence to NVIDIA Corporation, 2788
+San Tomas Expressway, Santa Clara, California 95051, United
+States of America, Attention: Legal Department.
+
+This Agreement and any exhibits incorporated into this
+Agreement constitute the entire agreement of the parties with
+respect to the subject matter of this Agreement and supersede
+all prior negotiations or documentation exchanged between the
+parties relating to this SDK license. Any additional and/or
+conflicting terms on documents issued by you are null, void,
+and invalid. Any amendment or waiver under this Agreement
+shall be in writing and signed by representatives of both
+parties.
+
+
+2. CUDA Toolkit Supplement to Software License Agreement for
+NVIDIA Software Development Kits
+------------------------------------------------------------
+
+
+Release date: August 16, 2018
+-----------------------------
+
+The terms in this supplement govern your use of the NVIDIA
+CUDA Toolkit SDK under the terms of your license agreement
+(“Agreement”) as modified by this supplement. Capitalized
+terms used but not defined below have the meaning assigned to
+them in the Agreement.
+
+This supplement is an exhibit to the Agreement and is
+incorporated as an integral part of the Agreement. In the
+event of conflict between the terms in this supplement and the
+terms in the Agreement, the terms in this supplement govern.
+
+
+2.1. License Scope
+
+The SDK is licensed for you to develop applications only for
+use in systems with NVIDIA GPUs.
+
+
+2.2. Distribution
+
+The portions of the SDK that are distributable under the
+Agreement are listed in Attachment A.
+
+
+2.3. Operating Systems
+
+Those portions of the SDK designed exclusively for use on the
+Linux or FreeBSD operating systems, or other operating systems
+derived from the source code to these operating systems, may
+be copied and redistributed for use in accordance with this
+Agreement, provided that the object code files are not
+modified in any way (except for unzipping of compressed
+files).
+
+
+2.4. Audio and Video Encoders and Decoders
+
+You acknowledge and agree that it is your sole responsibility
+to obtain any additional third-party licenses required to
+make, have made, use, have used, sell, import, and offer for
+sale your products or services that include or incorporate any
+third-party software and content relating to audio and/or
+video encoders and decoders from, including but not limited
+to, Microsoft, Thomson, Fraunhofer IIS, Sisvel S.p.A.,
+MPEG-LA, and Coding Technologies. NVIDIA does not grant to you
+under this Agreement any necessary patent or other rights with
+respect to any audio and/or video encoders and decoders.
+
+
+2.5. Licensing
+
+If the distribution terms in this Agreement are not suitable
+for your organization, or for any questions regarding this
+Agreement, please contact NVIDIA at
+nvidia-compute-license-questions@nvidia.com.
+
+
+2.6. Attachment A
+
+The following portions of the SDK are distributable under the
+Agreement:
+
+Component
+
+CUDA Runtime
+
+Windows
+
+cudart.dll, cudart_static.lib, cudadevrt.lib
+
+Mac OSX
+
+libcudart.dylib, libcudart_static.a, libcudadevrt.a
+
+Linux
+
+libcudart.so, libcudart_static.a, libcudadevrt.a
+
+Android
+
+libcudart.so, libcudart_static.a, libcudadevrt.a
+
+Component
+
+CUDA FFT Library
+
+Windows
+
+cufft.dll, cufftw.dll, cufft.lib, cufftw.lib
+
+Mac OSX
+
+libcufft.dylib, libcufft_static.a, libcufftw.dylib,
+libcufftw_static.a
+
+Linux
+
+libcufft.so, libcufft_static.a, libcufftw.so,
+libcufftw_static.a
+
+Android
+
+libcufft.so, libcufft_static.a, libcufftw.so,
+libcufftw_static.a
+
+Component
+
+CUDA BLAS Library
+
+Windows
+
+cublas.dll, cublasLt.dll
+
+Mac OSX
+
+libcublas.dylib, libcublasLt.dylib, libcublas_static.a,
+libcublasLt_static.a
+
+Linux
+
+libcublas.so, libcublasLt.so, libcublas_static.a,
+libcublasLt_static.a
+
+Android
+
+libcublas.so, libcublasLt.so, libcublas_static.a,
+libcublasLt_static.a
+
+Component
+
+NVIDIA "Drop-in" BLAS Library
+
+Windows
+
+nvblas.dll
+
+Mac OSX
+
+libnvblas.dylib
+
+Linux
+
+libnvblas.so
+
+Component
+
+CUDA Sparse Matrix Library
+
+Windows
+
+cusparse.dll, cusparse.lib
+
+Mac OSX
+
+libcusparse.dylib, libcusparse_static.a
+
+Linux
+
+libcusparse.so, libcusparse_static.a
+
+Android
+
+libcusparse.so, libcusparse_static.a
+
+Component
+
+CUDA Linear Solver Library
+
+Windows
+
+cusolver.dll, cusolver.lib
+
+Mac OSX
+
+libcusolver.dylib, libcusolver_static.a
+
+Linux
+
+libcusolver.so, libcusolver_static.a
+
+Android
+
+libcusolver.so, libcusolver_static.a
+
+Component
+
+CUDA Random Number Generation Library
+
+Windows
+
+curand.dll, curand.lib
+
+Mac OSX
+
+libcurand.dylib, libcurand_static.a
+
+Linux
+
+libcurand.so, libcurand_static.a
+
+Android
+
+libcurand.so, libcurand_static.a
+
+Component
+
+CUDA Accelerated Graph Library
+
+Component
+
+NVIDIA Performance Primitives Library
+
+Windows
+
+nppc.dll, nppc.lib, nppial.dll, nppial.lib, nppicc.dll,
+nppicc.lib, nppicom.dll, nppicom.lib, nppidei.dll,
+nppidei.lib, nppif.dll, nppif.lib, nppig.dll, nppig.lib,
+nppim.dll, nppim.lib, nppist.dll, nppist.lib, nppisu.dll,
+nppisu.lib, nppitc.dll, nppitc.lib, npps.dll, npps.lib
+
+Mac OSX
+
+libnppc.dylib, libnppc_static.a, libnppial.dylib,
+libnppial_static.a, libnppicc.dylib, libnppicc_static.a,
+libnppicom.dylib, libnppicom_static.a, libnppidei.dylib,
+libnppidei_static.a, libnppif.dylib, libnppif_static.a,
+libnppig.dylib, libnppig_static.a, libnppim.dylib,
+libnppisu_static.a, libnppitc.dylib, libnppitc_static.a,
+libnpps.dylib, libnpps_static.a
+
+Linux
+
+libnppc.so, libnppc_static.a, libnppial.so,
+libnppial_static.a, libnppicc.so, libnppicc_static.a,
+libnppicom.so, libnppicom_static.a, libnppidei.so,
+libnppidei_static.a, libnppif.so, libnppif_static.a
+libnppig.so, libnppig_static.a, libnppim.so,
+libnppim_static.a, libnppist.so, libnppist_static.a,
+libnppisu.so, libnppisu_static.a, libnppitc.so
+libnppitc_static.a, libnpps.so, libnpps_static.a
+
+Android
+
+libnppc.so, libnppc_static.a, libnppial.so,
+libnppial_static.a, libnppicc.so, libnppicc_static.a,
+libnppicom.so, libnppicom_static.a, libnppidei.so,
+libnppidei_static.a, libnppif.so, libnppif_static.a
+libnppig.so, libnppig_static.a, libnppim.so,
+libnppim_static.a, libnppist.so, libnppist_static.a,
+libnppisu.so, libnppisu_static.a, libnppitc.so
+libnppitc_static.a, libnpps.so, libnpps_static.a
+
+Component
+
+NVIDIA JPEG Library
+
+Linux
+
+libnvjpeg.so, libnvjpeg_static.a
+
+Component
+
+Internal common library required for statically linking to
+cuBLAS, cuSPARSE, cuFFT, cuRAND, nvJPEG and NPP
+
+Mac OSX
+
+libculibos.a
+
+Linux
+
+libculibos.a
+
+Component
+
+NVIDIA Runtime Compilation Library and Header
+
+All
+
+nvrtc.h
+
+Windows
+
+nvrtc.dll, nvrtc-builtins.dll
+
+Mac OSX
+
+libnvrtc.dylib, libnvrtc-builtins.dylib
+
+Linux
+
+libnvrtc.so, libnvrtc-builtins.so
+
+Component
+
+NVIDIA Optimizing Compiler Library
+
+Windows
+
+nvvm.dll
+
+Mac OSX
+
+libnvvm.dylib
+
+Linux
+
+libnvvm.so
+
+Component
+
+NVIDIA Common Device Math Functions Library
+
+Windows
+
+libdevice.10.bc
+
+Mac OSX
+
+libdevice.10.bc
+
+Linux
+
+libdevice.10.bc
+
+Component
+
+CUDA Occupancy Calculation Header Library
+
+All
+
+cuda_occupancy.h
+
+Component
+
+CUDA Half Precision Headers
+
+All
+
+cuda_fp16.h, cuda_fp16.hpp
+
+Component
+
+CUDA Profiling Tools Interface (CUPTI) Library
+
+Windows
+
+cupti.dll
+
+Mac OSX
+
+libcupti.dylib
+
+Linux
+
+libcupti.so
+
+Component
+
+NVIDIA Tools Extension Library
+
+Windows
+
+nvToolsExt.dll, nvToolsExt.lib
+
+Mac OSX
+
+libnvToolsExt.dylib
+
+Linux
+
+libnvToolsExt.so
+
+Component
+
+NVIDIA CUDA Driver Libraries
+
+Linux
+
+libcuda.so, libnvidia-fatbinaryloader.so,
+libnvidia-ptxjitcompiler.so
+
+The NVIDIA CUDA Driver Libraries are only distributable in
+applications that meet this criteria:
+
+  1. The application was developed starting from a NVIDIA CUDA
+    container obtained from Docker Hub or the NVIDIA GPU
+    Cloud, and
+
+  2. The resulting application is packaged as a Docker
+    container and distributed to users on Docker Hub or the
+    NVIDIA GPU Cloud only.
+
+
+2.7. Attachment B
+
+
+Additional Licensing Obligations
+
+The following third party components included in the SOFTWARE
+are licensed to Licensee pursuant to the following terms and
+conditions:
+
+  1. Licensee's use of the GDB third party component is
+    subject to the terms and conditions of GNU GPL v3:
+
+    This product includes copyrighted third-party software licensed
+    under the terms of the GNU General Public License v3 ("GPL v3").
+    All third-party software packages are copyright by their respective
+    authors. GPL v3 terms and conditions are hereby incorporated into
+    the Agreement by this reference:     http://www.gnu.org/licenses/gpl.txt
+
+    Consistent with these licensing requirements, the software
+    listed below is provided under the terms of the specified
+    open source software licenses. To obtain source code for
+    software provided under licenses that require
+    redistribution of source code, including the GNU General
+    Public License (GPL) and GNU Lesser General Public License
+    (LGPL), contact oss-requests@nvidia.com. This offer is
+    valid for a period of three (3) years from the date of the
+    distribution of this product by NVIDIA CORPORATION.
+
+    Component          License
+    CUDA-GDB           GPL v3
+
+  2. Licensee represents and warrants that any and all third
+    party licensing and/or royalty payment obligations in
+    connection with Licensee's use of the H.264 video codecs
+    are solely the responsibility of Licensee.
+
+  3. Licensee's use of the Thrust library is subject to the
+    terms and conditions of the Apache License Version 2.0.
+    All third-party software packages are copyright by their
+    respective authors. Apache License Version 2.0 terms and
+    conditions are hereby incorporated into the Agreement by
+    this reference.
+    http://www.apache.org/licenses/LICENSE-2.0.html
+
+    In addition, Licensee acknowledges the following notice:
+    Thrust includes source code from the Boost Iterator,
+    Tuple, System, and Random Number libraries.
+
+    Boost Software License - Version 1.0 - August 17th, 2003
+    . . . .
+
+    Permission is hereby granted, free of charge, to any person or
+    organization obtaining a copy of the software and accompanying
+    documentation covered by this license (the "Software") to use,
+    reproduce, display, distribute, execute, and transmit the Software,
+    and to prepare derivative works of the Software, and to permit
+    third-parties to whom the Software is furnished to do so, all
+    subject to the following:
+
+    The copyright notices in the Software and this entire statement,
+    including the above license grant, this restriction and the following
+    disclaimer, must be included in all copies of the Software, in whole
+    or in part, and all derivative works of the Software, unless such
+    copies or derivative works are solely in the form of machine-executable
+    object code generated by a source language processor.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+    MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND
+    NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR
+    ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR
+    OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+    OTHER DEALINGS IN THE SOFTWARE.
+
+  4. Licensee's use of the LLVM third party component is
+    subject to the following terms and conditions:
+
+    ======================================================
+    LLVM Release License
+    ======================================================
+    University of Illinois/NCSA
+    Open Source License
+
+    Copyright (c) 2003-2010 University of Illinois at Urbana-Champaign.
+    All rights reserved.
+
+    Developed by:
+
+        LLVM Team
+
+        University of Illinois at Urbana-Champaign
+
+        http://llvm.org
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to
+    deal with the Software without restriction, including without limitation the
+    rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+    sell copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    *  Redistributions of source code must retain the above copyright notice,
+       this list of conditions and the following disclaimers.
+
+    *  Redistributions in binary form must reproduce the above copyright
+       notice, this list of conditions and the following disclaimers in the
+       documentation and/or other materials provided with the distribution.
+
+    *  Neither the names of the LLVM Team, University of Illinois at Urbana-
+       Champaign, nor the names of its contributors may be used to endorse or
+       promote products derived from this Software without specific prior
+       written permission.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+    THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+    OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+    ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+    DEALINGS WITH THE SOFTWARE.
+
+  5. Licensee's use (e.g. nvprof) of the PCRE third party
+    component is subject to the following terms and
+    conditions:
+
+    ------------
+    PCRE LICENCE
+    ------------
+    PCRE is a library of functions to support regular expressions whose syntax
+    and semantics are as close as possible to those of the Perl 5 language.
+    Release 8 of PCRE is distributed under the terms of the "BSD" licence, as
+    specified below. The documentation for PCRE, supplied in the "doc"
+    directory, is distributed under the same terms as the software itself. The
+    basic library functions are written in C and are freestanding. Also
+    included in the distribution is a set of C++ wrapper functions, and a just-
+    in-time compiler that can be used to optimize pattern matching. These are
+    both optional features that can be omitted when the library is built.
+
+    THE BASIC LIBRARY FUNCTIONS
+    ---------------------------
+    Written by:       Philip Hazel
+    Email local part: ph10
+    Email domain:     cam.ac.uk
+    University of Cambridge Computing Service,
+    Cambridge, England.
+    Copyright (c) 1997-2012 University of Cambridge
+    All rights reserved.
+
+    PCRE JUST-IN-TIME COMPILATION SUPPORT
+    -------------------------------------
+    Written by:       Zoltan Herczeg
+    Email local part: hzmester
+    Emain domain:     freemail.hu
+    Copyright(c) 2010-2012 Zoltan Herczeg
+    All rights reserved.
+
+    STACK-LESS JUST-IN-TIME COMPILER
+    --------------------------------
+    Written by:       Zoltan Herczeg
+    Email local part: hzmester
+    Emain domain:     freemail.hu
+    Copyright(c) 2009-2012 Zoltan Herczeg
+    All rights reserved.
+
+    THE C++ WRAPPER FUNCTIONS
+    -------------------------
+    Contributed by:   Google Inc.
+    Copyright (c) 2007-2012, Google Inc.
+    All rights reserved.
+
+    THE "BSD" LICENCE
+    -----------------
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions are met:
+
+      * Redistributions of source code must retain the above copyright notice,
+        this list of conditions and the following disclaimer.
+
+      * Redistributions in binary form must reproduce the above copyright
+        notice, this list of conditions and the following disclaimer in the
+        documentation and/or other materials provided with the distribution.
+
+      * Neither the name of the University of Cambridge nor the name of Google
+        Inc. nor the names of their contributors may be used to endorse or
+        promote products derived from this software without specific prior
+        written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+    AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+    IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+    ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+    LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+    CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+    SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+    INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+    CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+    ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+    POSSIBILITY OF SUCH DAMAGE.
+
+  6. Some of the cuBLAS library routines were written by or
+    derived from code written by Vasily Volkov and are subject
+    to the Modified Berkeley Software Distribution License as
+    follows:
+
+    Copyright (c) 2007-2009, Regents of the University of California
+
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions are
+    met:
+        * Redistributions of source code must retain the above copyright
+          notice, this list of conditions and the following disclaimer.
+        * Redistributions in binary form must reproduce the above
+          copyright notice, this list of conditions and the following
+          disclaimer in the documentation and/or other materials provided
+          with the distribution.
+        * Neither the name of the University of California, Berkeley nor
+          the names of its contributors may be used to endorse or promote
+          products derived from this software without specific prior
+          written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR
+    IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+    WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+    DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
+    INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+    (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+    SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+    HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+    STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+    IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+    POSSIBILITY OF SUCH DAMAGE.
+
+  7. Some of the cuBLAS library routines were written by or
+    derived from code written by Davide Barbieri and are
+    subject to the Modified Berkeley Software Distribution
+    License as follows:
+
+    Copyright (c) 2008-2009 Davide Barbieri @ University of Rome Tor Vergata.
+
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions are
+    met:
+        * Redistributions of source code must retain the above copyright
+          notice, this list of conditions and the following disclaimer.
+        * Redistributions in binary form must reproduce the above
+          copyright notice, this list of conditions and the following
+          disclaimer in the documentation and/or other materials provided
+          with the distribution.
+        * The name of the author may not be used to endorse or promote
+          products derived from this software without specific prior
+          written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR
+    IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+    WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+    DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
+    INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+    (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+    SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+    HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+    STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+    IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+    POSSIBILITY OF SUCH DAMAGE.
+
+  8. Some of the cuBLAS library routines were derived from
+    code developed by the University of Tennessee and are
+    subject to the Modified Berkeley Software Distribution
+    License as follows:
+
+    Copyright (c) 2010 The University of Tennessee.
+
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions are
+    met:
+        * Redistributions of source code must retain the above copyright
+          notice, this list of conditions and the following disclaimer.
+        * Redistributions in binary form must reproduce the above
+          copyright notice, this list of conditions and the following
+          disclaimer listed in this license in the documentation and/or
+          other materials provided with the distribution.
+        * Neither the name of the copyright holders nor the names of its
+          contributors may be used to endorse or promote products derived
+          from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+  9. Some of the cuBLAS library routines were written by or
+    derived from code written by Jonathan Hogg and are subject
+    to the Modified Berkeley Software Distribution License as
+    follows:
+
+    Copyright (c) 2012, The Science and Technology Facilities Council (STFC).
+
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions are
+    met:
+        * Redistributions of source code must retain the above copyright
+          notice, this list of conditions and the following disclaimer.
+        * Redistributions in binary form must reproduce the above
+          copyright notice, this list of conditions and the following
+          disclaimer in the documentation and/or other materials provided
+          with the distribution.
+        * Neither the name of the STFC nor the names of its contributors
+          may be used to endorse or promote products derived from this
+          software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE STFC BE
+    LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+    CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+    SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+    BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+    WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+    OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+    IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+  10. Some of the cuBLAS library routines were written by or
+    derived from code written by Ahmad M. Abdelfattah, David
+    Keyes, and Hatem Ltaief, and are subject to the Apache
+    License, Version 2.0, as follows:
+
+     -- (C) Copyright 2013 King Abdullah University of Science and Technology
+      Authors:
+      Ahmad Abdelfattah (ahmad.ahmad@kaust.edu.sa)
+      David Keyes (david.keyes@kaust.edu.sa)
+      Hatem Ltaief (hatem.ltaief@kaust.edu.sa)
+
+      Redistribution  and  use  in  source and binary forms, with or without
+      modification,  are  permitted  provided  that the following conditions
+      are met:
+
+      * Redistributions  of  source  code  must  retain  the above copyright
+        notice,  this  list  of  conditions  and  the  following  disclaimer.
+      * Redistributions  in  binary  form must reproduce the above copyright
+        notice,  this list of conditions and the following disclaimer in the
+        documentation  and/or other materials provided with the distribution.
+      * Neither  the  name of the King Abdullah University of Science and
+        Technology nor the names of its contributors may be used to endorse
+        or promote products derived from this software without specific prior
+        written permission.
+
+      THIS  SOFTWARE  IS  PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+      ``AS IS''  AND  ANY  EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+      LIMITED  TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+      A  PARTICULAR  PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+      HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+      SPECIAL,  EXEMPLARY,  OR  CONSEQUENTIAL  DAMAGES  (INCLUDING,  BUT NOT
+      LIMITED  TO,  PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+      DATA,  OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+      THEORY  OF  LIABILITY,  WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+      (INCLUDING  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+      OF  THIS  SOFTWARE,  EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE
+
+  11. Some of the cuSPARSE library routines were written by or
+    derived from code written by Li-Wen Chang and are subject
+    to the NCSA Open Source License as follows:
+
+    Copyright (c) 2012, University of Illinois.
+
+    All rights reserved.
+
+    Developed by: IMPACT Group, University of Illinois, http://impact.crhc.illinois.edu
+
+    Permission is hereby granted, free of charge, to any person obtaining
+    a copy of this software and associated documentation files (the
+    "Software"), to deal with the Software without restriction, including
+    without limitation the rights to use, copy, modify, merge, publish,
+    distribute, sublicense, and/or sell copies of the Software, and to
+    permit persons to whom the Software is furnished to do so, subject to
+    the following conditions:
+        * Redistributions of source code must retain the above copyright
+          notice, this list of conditions and the following disclaimer.
+        * Redistributions in binary form must reproduce the above
+          copyright notice, this list of conditions and the following
+          disclaimers in the documentation and/or other materials provided
+          with the distribution.
+        * Neither the names of IMPACT Group, University of Illinois, nor
+          the names of its contributors may be used to endorse or promote
+          products derived from this Software without specific prior
+          written permission.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+    MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+    NONINFRINGEMENT. IN NO EVENT SHALL THE CONTRIBUTORS OR COPYRIGHT
+    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+    IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
+    IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE
+    SOFTWARE.
+
+  12. Some of the cuRAND library routines were written by or
+    derived from code written by Mutsuo Saito and Makoto
+    Matsumoto and are subject to the following license:
+
+    Copyright (c) 2009, 2010 Mutsuo Saito, Makoto Matsumoto and Hiroshima
+    University. All rights reserved.
+
+    Copyright (c) 2011 Mutsuo Saito, Makoto Matsumoto, Hiroshima
+    University and University of Tokyo.  All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions are
+    met:
+        * Redistributions of source code must retain the above copyright
+          notice, this list of conditions and the following disclaimer.
+        * Redistributions in binary form must reproduce the above
+          copyright notice, this list of conditions and the following
+          disclaimer in the documentation and/or other materials provided
+          with the distribution.
+        * Neither the name of the Hiroshima University nor the names of
+          its contributors may be used to endorse or promote products
+          derived from this software without specific prior written
+          permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+  13. Some of the cuRAND library routines were derived from
+    code developed by D. E. Shaw Research and are subject to
+    the following license:
+
+    Copyright 2010-2011, D. E. Shaw Research.
+
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions are
+    met:
+        * Redistributions of source code must retain the above copyright
+          notice, this list of conditions, and the following disclaimer.
+        * Redistributions in binary form must reproduce the above
+          copyright notice, this list of conditions, and the following
+          disclaimer in the documentation and/or other materials provided
+          with the distribution.
+        * Neither the name of D. E. Shaw Research nor the names of its
+          contributors may be used to endorse or promote products derived
+          from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+  14. Some of the Math library routines were written by or
+    derived from code developed by Norbert Juffa and are
+    subject to the following license:
+
+    Copyright (c) 2015-2017, Norbert Juffa
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    1. Redistributions of source code must retain the above copyright
+       notice, this list of conditions and the following disclaimer.
+
+    2. Redistributions in binary form must reproduce the above copyright
+       notice, this list of conditions and the following disclaimer in the
+       documentation and/or other materials provided with the distribution.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+  15. Licensee's use of the lz4 third party component is
+    subject to the following terms and conditions:
+
+    Copyright (C) 2011-2013, Yann Collet.
+    BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions are
+    met:
+
+        * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+        * Redistributions in binary form must reproduce the above
+    copyright notice, this list of conditions and the following disclaimer
+    in the documentation and/or other materials provided with the
+    distribution.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+  16. The NPP library uses code from the Boost Math Toolkit,
+    and is subject to the following license:
+
+    Boost Software License - Version 1.0 - August 17th, 2003
+    . . . .
+
+    Permission is hereby granted, free of charge, to any person or
+    organization obtaining a copy of the software and accompanying
+    documentation covered by this license (the "Software") to use,
+    reproduce, display, distribute, execute, and transmit the Software,
+    and to prepare derivative works of the Software, and to permit
+    third-parties to whom the Software is furnished to do so, all
+    subject to the following:
+
+    The copyright notices in the Software and this entire statement,
+    including the above license grant, this restriction and the following
+    disclaimer, must be included in all copies of the Software, in whole
+    or in part, and all derivative works of the Software, unless such
+    copies or derivative works are solely in the form of machine-executable
+    object code generated by a source language processor.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+    MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND
+    NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR
+    ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR
+    OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING
+    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+    OTHER DEALINGS IN THE SOFTWARE.
+
+  17. Portions of the Nsight Eclipse Edition is subject to the
+    following license:
+
+    The Eclipse Foundation makes available all content in this plug-in
+    ("Content"). Unless otherwise indicated below, the Content is provided
+    to you under the terms and conditions of the Eclipse Public License
+    Version 1.0 ("EPL"). A copy of the EPL is available at http://
+    www.eclipse.org/legal/epl-v10.html. For purposes of the EPL, "Program"
+    will mean the Content.
+
+    If you did not receive this Content directly from the Eclipse
+    Foundation, the Content is being redistributed by another party
+    ("Redistributor") and different terms and conditions may apply to your
+    use of any object code in the Content. Check the Redistributor's
+    license that was provided with the Content. If no such license exists,
+    contact the Redistributor. Unless otherwise indicated below, the terms
+    and conditions of the EPL still apply to any source code in the
+    Content and such source code may be obtained at http://www.eclipse.org.
+
+  18. Some of the cuBLAS library routines uses code from
+    OpenAI, which is subject to the following license:
+
+    License URL
+    https://github.com/openai/openai-gemm/blob/master/LICENSE
+
+    License Text
+    The MIT License
+
+    Copyright (c) 2016 OpenAI (http://openai.com), 2016 Google Inc.
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in
+    all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+    THE SOFTWARE.
+
+  19. Licensee's use of the Visual Studio Setup Configuration
+    Samples is subject to the following license:
+
+    The MIT License (MIT)
+    Copyright (C) Microsoft Corporation. All rights reserved.
+
+    Permission is hereby granted, free of charge, to any person
+    obtaining a copy of this software and associated documentation
+    files (the "Software"), to deal in the Software without restriction,
+    including without limitation the rights to use, copy, modify, merge,
+    publish, distribute, sublicense, and/or sell copies of the Software,
+    and to permit persons to whom the Software is furnished to do so,
+    subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included
+    in all copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+    OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+  20. Licensee's use of linmath.h header for CPU functions for
+    GL vector/matrix operations from lunarG is subject to the
+    Apache License Version 2.0.
+
+  21. The DX12-CUDA sample uses the d3dx12.h header, which is
+    subject to the MIT license .
+
+-----------------
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ee73fb8f81b53aae2a5bfc07dd96bf4b19462df7
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/__version__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/__version__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..782d31ba9cea785da46ad98eeca0a2842b202851
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/__version__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/_internal_utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/_internal_utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..abfe5f66a4d46802539883c3cd6bb46d85446cb4
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/_internal_utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/adapters.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/adapters.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3776cd1eed95f1f3c8fb70fc06a21f2967ea1ab1
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/adapters.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/api.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/api.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..929c60a709278b1a138e2cf6e46c5eacffa748e6
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/api.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/auth.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/auth.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ec873b2d9b5bd268abce78625c3937cbc32bda21
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/auth.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/certs.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/certs.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..2b669054b98f68eb90c7ca727408fdaa0bca6f4d
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/certs.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/compat.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/compat.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..fdb4cb8decc5c11413237ecc02fe0bb36cd0920a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/compat.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/cookies.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/cookies.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..01a7977233a8a3df83d3f50cdc51c57527b4c6e7
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/cookies.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/exceptions.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/exceptions.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..fb2d1666c9deb9176e772fa1e73a142ca454a7bc
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/exceptions.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/help.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/help.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e6d3753e88f2178b94dc81b65da4bb93a8d7ffa1
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/help.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/hooks.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/hooks.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..418b57f7367741027c6a1640f4aaee2f0eeb7dec
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/hooks.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/models.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/models.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..849af323c5381dfafc6e6d9e277b6286ce859d01
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/models.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/packages.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/packages.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a81794673fdecdf911d557e09e02ece56633aaab
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/packages.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/sessions.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/sessions.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..19f43a256b4a1888a470cdda5846b4b14dffd0b4
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/sessions.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/status_codes.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/status_codes.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..2aac3ae9628656701caee3ba02d73eb8282b5ab2
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/status_codes.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/structures.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/structures.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..7af3d5c6ab2da5321b36224497dde2839357b822
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/structures.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f386d3616398d5f432bbc3a76cb25b494e069cae
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/requests/__pycache__/utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torch-2.10.0+cu128.dist-info/licenses/LICENSE b/URSA/.venv_ursa/lib/python3.12/site-packages/torch-2.10.0+cu128.dist-info/licenses/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..47bee4248ac8175dc1e9fe294be040b56d4216e3
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torch-2.10.0+cu128.dist-info/licenses/LICENSE
@@ -0,0 +1,8961 @@
+From PyTorch:
+
+Copyright (c) 2016-     Facebook, Inc            (Adam Paszke)
+Copyright (c) 2014-     Facebook, Inc            (Soumith Chintala)
+Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
+Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
+Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
+Copyright (c) 2011-2013 NYU                      (Clement Farabet)
+Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
+Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
+Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
+
+From Caffe2:
+
+Copyright (c) 2016-present, Facebook Inc. All rights reserved.
+
+All contributions by Facebook:
+Copyright (c) 2016 Facebook Inc.
+
+All contributions by Google:
+Copyright (c) 2015 Google Inc.
+All rights reserved.
+
+All contributions by Yangqing Jia:
+Copyright (c) 2015 Yangqing Jia
+All rights reserved.
+
+All contributions by Kakao Brain:
+Copyright 2019-2020 Kakao Brain
+
+All contributions by Cruise LLC:
+Copyright (c) 2022 Cruise LLC.
+All rights reserved.
+
+All contributions by Tri Dao:
+Copyright (c) 2024 Tri Dao.
+All rights reserved.
+
+All contributions by Arm:
+Copyright (c) 2021, 2023-2025 Arm Limited and/or its affiliates
+
+All contributions from Caffe:
+Copyright(c) 2013, 2014, 2015, the respective contributors
+All rights reserved.
+
+All other contributions:
+Copyright(c) 2015, 2016 the respective contributors
+All rights reserved.
+
+Caffe2 uses a copyright model similar to Caffe: each contributor holds
+copyright over their contributions to Caffe2. The project versioning records
+all such contribution and copyright details. If a contributor wants to further
+mark their specific copyright on a particular contribution, they should
+indicate their copyright solely in the commit message of the change when it is
+committed.
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+
+3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
+   and IDIAP Research Institute nor the names of its contributors may be
+   used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+
+
+The PyTorch repository and source distributions bundle several libraries that are 
+compatibly licensed.  We list these here.
+
+Name: DCGM
+License: Apache-2.0
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/LICENSE
+
+Name: FP16
+License: MIT
+Files: /pytorch/third_party/FP16
+  For details, see the files concatenated below: /pytorch/third_party/FP16/LICENSE
+
+Name: FXdiv
+License: MIT
+Files: /pytorch/third_party/FXdiv
+  For details, see the files concatenated below: /pytorch/third_party/FXdiv/LICENSE
+
+Name: NNPACK
+License: BSD-2-Clause
+Files: /pytorch/third_party/NNPACK
+  For details, see the files concatenated below: /pytorch/third_party/NNPACK/LICENSE
+
+Name: NVTX
+License: Apache-2.0 with exception
+Files: /pytorch/third_party/NVTX
+  For details, see the files concatenated below: /pytorch/third_party/NVTX/LICENSE.txt
+
+Name: VulkanMemoryAllocator
+License: MIT
+Files: /pytorch/third_party/VulkanMemoryAllocator
+  For details, see the files concatenated below: /pytorch/third_party/VulkanMemoryAllocator/LICENSE.txt
+
+Name: XNNPACK
+License: BSD-3-Clause
+Files: /pytorch/third_party/XNNPACK
+  For details, see the files concatenated below: /pytorch/third_party/XNNPACK/LICENSE
+
+Name: aiter
+License: MIT
+Files: /pytorch/third_party/aiter
+  For details, see the files concatenated below: /pytorch/third_party/aiter/LICENSE
+
+Name: benchmark
+License: Apache-2.0
+Files: /pytorch/third_party/benchmark,
+     /pytorch/third_party/opentelemetry-cpp/third_party/benchmark,
+     /pytorch/third_party/protobuf/third_party/benchmark
+  For details, see the files concatenated below: /pytorch/third_party/benchmark/LICENSE,
+     /pytorch/third_party/opentelemetry-cpp/third_party/benchmark/LICENSE,
+     /pytorch/third_party/protobuf/third_party/benchmark/LICENSE
+
+Name: boost-vcpkg-helpers
+License: MIT
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/boost-vcpkg-helpers
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/boost-vcpkg-helpers/LICENSE.txt
+
+Name: cJSON
+License: MIT
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb/examples/rest/cJSON,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb/examples/rest/cJSON
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb/examples/rest/cJSON/LICENSE,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb/examples/rest/cJSON/LICENSE
+
+Name: catch2
+License: BSL-1.0
+Files: /pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/3rd_party/include/opentracing/catch2
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/3rd_party/include/opentracing/catch2/LICENSE.txt
+
+Name: clog
+License: BSD-2-Clause
+Files: /pytorch/third_party/cpuinfo/deps/clog,
+     /pytorch/third_party/fbgemm/external/cpuinfo/deps/clog
+  For details, see the files concatenated below: /pytorch/third_party/cpuinfo/deps/clog/LICENSE,
+     /pytorch/third_party/fbgemm/external/cpuinfo/deps/clog/LICENSE
+
+Name: colorama
+License: BSD-3-Clause
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/testing/python3/libs_3rdparty/colorama
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/testing/python3/libs_3rdparty/colorama/LICENSE.txt
+
+Name: composable_kernel
+License: MIT
+Files: /pytorch/third_party/aiter/3rdparty/composable_kernel,
+     /pytorch/third_party/composable_kernel,
+     /pytorch/third_party/fbgemm/external/composable_kernel,
+     /pytorch/third_party/flash-attention/csrc/composable_kernel
+  For details, see the files concatenated below: /pytorch/third_party/aiter/3rdparty/composable_kernel/LICENSE,
+     /pytorch/third_party/composable_kernel/LICENSE,
+     /pytorch/third_party/fbgemm/external/composable_kernel/LICENSE,
+     /pytorch/third_party/flash-attention/csrc/composable_kernel/LICENSE
+
+Name: cpp-httplib
+License: MIT
+Files: /pytorch/third_party/cpp-httplib
+  For details, see the files concatenated below: /pytorch/third_party/cpp-httplib/LICENSE
+
+Name: cpplint
+License: BSD-3-Clause
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json/third_party/cpplint
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json/third_party/cpplint/LICENSE
+
+Name: cpr
+License: MIT
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr/LICENSE
+
+Name: cpuinfo
+License: BSD-2-Clause
+Files: /pytorch/third_party/cpuinfo,
+     /pytorch/third_party/fbgemm/external/cpuinfo
+  For details, see the files concatenated below: /pytorch/third_party/cpuinfo/LICENSE,
+     /pytorch/third_party/fbgemm/external/cpuinfo/LICENSE
+
+Name: cudnn_frontend
+License: MIT
+Files: /pytorch/third_party/cudnn_frontend
+  For details, see the files concatenated below: /pytorch/third_party/cudnn_frontend/LICENSE.txt
+
+Name: cutlass
+License: BSD-3-Clause
+Files: /pytorch/third_party/cutlass,
+     /pytorch/third_party/fbgemm/external/cutlass,
+     /pytorch/third_party/flash-attention/csrc/cutlass
+  For details, see the files concatenated below: /pytorch/third_party/cutlass/LICENSE.txt,
+     /pytorch/third_party/fbgemm/external/cutlass/LICENSE.txt,
+     /pytorch/third_party/flash-attention/csrc/cutlass/LICENSE.txt
+
+Name: dart
+License: Apache-2.0
+Files: /pytorch/third_party/flatbuffers/dart
+  For details, see the files concatenated below: /pytorch/third_party/flatbuffers/dart/LICENSE
+
+Name: docs
+License: Apache-2.0 with exception
+Files: /pytorch/third_party/NVTX/docs
+  For details, see the files concatenated below: /pytorch/third_party/NVTX/docs/LICENSE.txt
+
+Name: doctest
+License: MIT
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json/test/thirdparty/doctest
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json/test/thirdparty/doctest/LICENSE.txt
+
+Name: duktape-1.5.2
+License: MIT
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.5.2,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.5.2
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.5.2/LICENSE.txt,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.5.2/LICENSE.txt
+
+Name: duktape-1.8.0
+License: MIT
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.8.0,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.8.0
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.8.0/LICENSE.txt,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.8.0/LICENSE.txt
+
+Name: dynolog
+License: MIT
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/LICENSE
+
+Name: etw
+License: MIT
+Files: /pytorch/third_party/opentelemetry-cpp/exporters/etw/include/opentelemetry/exporters/etw
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/exporters/etw/include/opentelemetry/exporters/etw/LICENSE
+
+Name: expected
+License: MIT
+Files: /pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/3rd_party/include/opentracing/expected
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/3rd_party/include/opentracing/expected/LICENSE
+
+Name: fbgemm
+License: BSD-3-Clause
+Files: /pytorch/third_party/fbgemm
+  For details, see the files concatenated below: /pytorch/third_party/fbgemm/LICENSE
+
+Name: ffnvcodec
+License: MIT with exception
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/ffnvcodec
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/ffnvcodec/LICENSE.txt
+
+Name: flash-attention
+License: BSD-3-Clause
+Files: /pytorch/third_party/flash-attention
+  For details, see the files concatenated below: /pytorch/third_party/flash-attention/LICENSE
+
+Name: flatbuffers
+License: Apache-2.0
+Files: /pytorch/third_party/flatbuffers
+  For details, see the files concatenated below: /pytorch/third_party/flatbuffers/LICENSE
+
+Name: fmt
+License: MIT with exception
+Files: /pytorch/third_party/fmt,
+     /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt,
+     /pytorch/third_party/kineto/libkineto/third_party/fmt
+  For details, see the files concatenated below: /pytorch/third_party/fmt/LICENSE,
+     /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt/LICENSE.rst,
+     /pytorch/third_party/kineto/libkineto/third_party/fmt/LICENSE
+
+Name: gemmlowp
+License: Apache-2.0
+Files: /pytorch/third_party/gemmlowp/gemmlowp
+  For details, see the files concatenated below: /pytorch/third_party/gemmlowp/gemmlowp/LICENSE
+
+Name: generator
+License: Apache-2.0
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest/googlemock/scripts/generator,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest/googlemock/scripts/generator,
+     /pytorch/third_party/protobuf/third_party/googletest/googlemock/scripts/generator,
+     /pytorch/third_party/tensorpipe/third_party/googletest/googlemock/scripts/generator
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest/googlemock/scripts/generator/LICENSE,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest/googlemock/scripts/generator/LICENSE,
+     /pytorch/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/LICENSE,
+     /pytorch/third_party/tensorpipe/third_party/googletest/googlemock/scripts/generator/LICENSE
+
+Name: gettimeofday
+License: Apache-2.0
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/gettimeofday
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/gettimeofday/LICENSE
+
+Name: gloo
+License: BSD-3-Clause
+Files: /pytorch/third_party/gloo
+  For details, see the files concatenated below: /pytorch/third_party/gloo/LICENSE
+
+Name: googlemock
+License: BSD-3-Clause
+Files: /pytorch/third_party/protobuf/third_party/googletest/googlemock,
+     /pytorch/third_party/tensorpipe/third_party/googletest/googlemock
+  For details, see the files concatenated below: /pytorch/third_party/protobuf/third_party/googletest/googlemock/LICENSE,
+     /pytorch/third_party/tensorpipe/third_party/googletest/googlemock/LICENSE
+
+Name: googletest
+License: BSD-3-Clause
+Files: /pytorch/third_party/fbgemm/external/googletest,
+     /pytorch/third_party/googletest,
+     /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest,
+     /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest,
+     /pytorch/third_party/kineto/libkineto/third_party/googletest,
+     /pytorch/third_party/opentelemetry-cpp/third_party/googletest,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest,
+     /pytorch/third_party/protobuf/third_party/googletest,
+     /pytorch/third_party/protobuf/third_party/googletest/googletest,
+     /pytorch/third_party/tensorpipe/third_party/googletest,
+     /pytorch/third_party/tensorpipe/third_party/googletest/googletest
+  For details, see the files concatenated below: /pytorch/third_party/fbgemm/external/googletest/LICENSE,
+     /pytorch/third_party/googletest/LICENSE,
+     /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest/LICENSE,
+     /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest/LICENSE,
+     /pytorch/third_party/kineto/libkineto/third_party/googletest/LICENSE,
+     /pytorch/third_party/opentelemetry-cpp/third_party/googletest/LICENSE,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest/LICENSE,
+     /pytorch/third_party/protobuf/third_party/googletest/LICENSE,
+     /pytorch/third_party/protobuf/third_party/googletest/googletest/LICENSE,
+     /pytorch/third_party/tensorpipe/third_party/googletest/LICENSE,
+     /pytorch/third_party/tensorpipe/third_party/googletest/googletest/LICENSE
+
+Name: gtest
+License: BSD-3-Clause
+Files: /pytorch/third_party/ideep/mkl-dnn/tests/gtests/gtest
+  For details, see the files concatenated below: /pytorch/third_party/ideep/mkl-dnn/tests/gtests/gtest/LICENSE
+
+Name: hipify_torch
+License: MIT
+Files: /pytorch/third_party/fbgemm/external/hipify_torch
+  For details, see the files concatenated below: /pytorch/third_party/fbgemm/external/hipify_torch/LICENSE.txt
+
+Name: hstu
+License: BSD-3-Clause
+Files: /pytorch/third_party/fbgemm/fbgemm_gpu/experimental/hstu
+  For details, see the files concatenated below: /pytorch/third_party/fbgemm/fbgemm_gpu/experimental/hstu/LICENSE
+
+Name: hungarian
+License: Permissive (free to use)
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/hungarian
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/hungarian/LICENSE.txt
+
+Name: ideep
+License: MIT
+Files: /pytorch/third_party/ideep
+  For details, see the files concatenated below: /pytorch/third_party/ideep/LICENSE
+
+Name: irrlicht
+License: MIT
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/irrlicht
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/irrlicht/LICENSE.txt
+
+Name: kineto
+License: BSD-3-Clause
+Files: /pytorch/third_party/kineto
+  For details, see the files concatenated below: /pytorch/third_party/kineto/LICENSE
+
+Name: libnop
+License: Apache-2.0
+Files: /pytorch/third_party/tensorpipe/third_party/libnop
+  For details, see the files concatenated below: /pytorch/third_party/tensorpipe/third_party/libnop/LICENSE
+
+Name: libstemmer
+License: BSD-3-Clause
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/libstemmer
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/libstemmer/LICENSE
+
+Name: libuv
+License: MIT
+Files: /pytorch/third_party/tensorpipe/third_party/libuv
+  For details, see the files concatenated below: /pytorch/third_party/tensorpipe/third_party/libuv/LICENSE
+
+Name: mimalloc
+License: MIT
+Files: /pytorch/third_party/mimalloc
+  For details, see the files concatenated below: /pytorch/third_party/mimalloc/LICENSE
+
+Name: miniz-3.0.2
+License: MIT
+Files: /pytorch/third_party/miniz-3.0.2
+  For details, see the files concatenated below: /pytorch/third_party/miniz-3.0.2/LICENSE
+
+Name: mkl-dnn
+License: Apache-2.0
+Files: /pytorch/third_party/ideep/mkl-dnn
+  For details, see the files concatenated below: /pytorch/third_party/ideep/mkl-dnn/LICENSE
+
+Name: ms-gsl
+License: MIT
+Files: /pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl/LICENSE
+
+Name: mx
+License: MIT
+Files: /pytorch/third_party/fbgemm/fbgemm_gpu/src/quantize_ops/mx,
+     /pytorch/third_party/fbgemm/fbgemm_gpu/test/quantize/mx
+  For details, see the files concatenated below: /pytorch/third_party/fbgemm/fbgemm_gpu/src/quantize_ops/mx/LICENSE,
+     /pytorch/third_party/fbgemm/fbgemm_gpu/test/quantize/mx/LICENSE
+
+Name: onnx
+License: Apache-2.0
+Files: /pytorch/third_party/onnx
+  For details, see the files concatenated below: /pytorch/third_party/onnx/LICENSE
+
+Name: opentelemetry-cpp
+License: Apache-2.0
+Files: /pytorch/third_party/opentelemetry-cpp
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/LICENSE
+
+Name: opentelemetry-proto
+License: Apache-2.0
+Files: /pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto/LICENSE
+
+Name: opentracing-cpp
+License: Apache-2.0
+Files: /pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/LICENSE
+
+Name: pdcurses
+License: Public Domain for core
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/pdcurses
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/pdcurses/LICENSE
+
+Name: pfs
+License: Apache-2.0
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs/LICENSE
+
+Name: physac
+License: MIT
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/physac
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/physac/LICENSE
+
+Name: pqp
+License: Apache-2.0
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/pqp
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/pqp/LICENSE
+
+Name: prometheus-cpp
+License: MIT
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/LICENSE,
+     /pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/LICENSE
+
+Name: protobuf
+License: BSD-3-Clause
+Files: /pytorch/third_party/protobuf
+  For details, see the files concatenated below: /pytorch/third_party/protobuf/LICENSE
+
+Name: psimd
+License: MIT
+Files: /pytorch/third_party/psimd
+  For details, see the files concatenated below: /pytorch/third_party/psimd/LICENSE
+
+Name: pthreadpool
+License: BSD-2-Clause
+Files: /pytorch/third_party/pthreadpool
+  For details, see the files concatenated below: /pytorch/third_party/pthreadpool/LICENSE
+
+Name: pybind11
+License: BSD-3-Clause
+Files: /pytorch/third_party/onnx/third_party/pybind11,
+     /pytorch/third_party/pybind11,
+     /pytorch/third_party/tensorpipe/third_party/pybind11
+  For details, see the files concatenated below: /pytorch/third_party/onnx/third_party/pybind11/LICENSE,
+     /pytorch/third_party/pybind11/LICENSE,
+     /pytorch/third_party/tensorpipe/third_party/pybind11/LICENSE
+
+Name: python
+License: Apache-2.0 with exception
+Files: /pytorch/third_party/NVTX/python
+  For details, see the files concatenated below: /pytorch/third_party/NVTX/python/LICENSE.txt
+
+Name: python
+License: BSD-3-Clause
+Files: /pytorch/third_party/cutlass/python
+  For details, see the files concatenated below: /pytorch/third_party/cutlass/python/LICENSE.txt
+
+Name: python
+License: BSD-3-Clause
+Files: /pytorch/third_party/fbgemm/external/cutlass/python
+  For details, see the files concatenated below: /pytorch/third_party/fbgemm/external/cutlass/python/LICENSE.txt
+
+Name: python
+License: BSD-3-Clause
+Files: /pytorch/third_party/flash-attention/csrc/cutlass/python
+  For details, see the files concatenated below: /pytorch/third_party/flash-attention/csrc/cutlass/python/LICENSE.txt
+
+Name: python-peachpy
+License: BSD-2-Clause
+Files: /pytorch/third_party/python-peachpy
+  For details, see the files concatenated below: /pytorch/third_party/python-peachpy/LICENSE.rst
+
+Name: sigslot
+License: Public Domain
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/sigslot
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/sigslot/LICENSE
+
+Name: sleef
+License: BSL-1.0
+Files: /pytorch/third_party/sleef
+  For details, see the files concatenated below: /pytorch/third_party/sleef/LICENSE.txt
+
+Name: swift
+License: Apache-2.0
+Files: /pytorch/third_party/flatbuffers/swift
+  For details, see the files concatenated below: /pytorch/third_party/flatbuffers/swift/LICENSE
+
+Name: tb_plugin
+License: BSD-3-Clause
+Files: /pytorch/third_party/kineto/tb_plugin
+  For details, see the files concatenated below: /pytorch/third_party/kineto/tb_plugin/LICENSE
+
+Name: tensorflow-common
+License: MIT
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/tensorflow-common
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/tensorflow-common/LICENSE.txt
+
+Name: tensorpipe
+License: BSD-3-Clause
+Files: /pytorch/third_party/tensorpipe
+  For details, see the files concatenated below: /pytorch/third_party/tensorpipe/LICENSE.txt
+
+Name: test
+License: MIT with exception
+Files: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr/test
+  For details, see the files concatenated below: /pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr/test/LICENSE
+
+Name: variant
+License: BSD-3-Clause
+Files: /pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/3rd_party/include/opentracing/variant
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/3rd_party/include/opentracing/variant/LICENSE
+
+Name: vcpkg
+License: MIT
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/LICENSE.txt
+
+Name: vulkan
+License: Apache-2.0 with exception
+Files: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/vulkan
+  For details, see the files concatenated below: /pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/vulkan/LICENSE.txt
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/LICENSE
+----------------------------------------------------------------------------------
+Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+
+/pytorch/third_party/FP16/LICENSE
+---------------------------------
+The MIT License (MIT)
+
+Copyright (c) 2017 Facebook Inc.
+Copyright (c) 2017 Georgia Institute of Technology
+Copyright 2019 Google LLC
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+
+/pytorch/third_party/FXdiv/LICENSE
+----------------------------------
+The MIT License (MIT)
+
+Copyright (c) 2017 Facebook Inc.
+Copyright (c) 2016-2017 Marat Dukhan
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+
+/pytorch/third_party/NNPACK/LICENSE
+-----------------------------------
+Copyright (c) 2017 Facebook Inc.
+Copyright (c) 2015-2017, Georgia Institute of Technology
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/NVTX/LICENSE.txt
+-------------------------------------
+==============================================================================
+NVTX is under the Apache License v2.0 with LLVM Exceptions:
+==============================================================================
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+    1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+    2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+    3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+    4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+    5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+    6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+    7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+    8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+    9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+    END OF TERMS AND CONDITIONS
+
+    APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+    Copyright [yyyy] [name of copyright owner]
+
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use this file except in compliance with the License.
+    You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+
+
+---- LLVM Exceptions to the Apache 2.0 License ----
+
+As an exception, if, as a result of your compiling your source code, portions
+of this Software are embedded into an Object form of such source code, you
+may redistribute such embedded portions in such Object form without complying
+with the conditions of Sections 4(a), 4(b) and 4(d) of the License.
+
+In addition, if you combine or link compiled forms of this Software with
+software that is licensed under the GPLv2 ("Combined Software") and if a
+court of competent jurisdiction determines that the patent provision (Section
+3), the indemnity provision (Section 9) or other Section of the License
+conflicts with the conditions of the GPLv2, you may retroactively and
+prospectively choose to deem waived or otherwise exclude such Section(s) of
+the License, but only in their entirety and only with respect to the Combined
+Software.
+
+
+
+/pytorch/third_party/VulkanMemoryAllocator/LICENSE.txt
+------------------------------------------------------
+Copyright (c) 2017-2025 Advanced Micro Devices, Inc. All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+/pytorch/third_party/XNNPACK/LICENSE
+------------------------------------
+BSD License
+
+For XNNPACK software
+
+Copyright (c) Facebook, Inc. and its affiliates. All rights reserved.
+Copyright 2019 Google LLC
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+ * Neither the name Facebook nor the names of its contributors may be used to
+   endorse or promote products derived from this software without specific
+   prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/aiter/LICENSE
+----------------------------------
+Copyright © Advanced Micro Devices, Inc. All rights reserved.
+
+MIT License
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/benchmark/LICENSE
+--------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/benchmark/LICENSE
+--------------------------------------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/protobuf/third_party/benchmark/LICENSE
+-----------------------------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/boost-vcpkg-helpers/LICENSE.txt
+----------------------------------------------------------------------------------------
+Copyright (c) Microsoft Corporation
+
+All rights reserved. 
+
+MIT License
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb/examples/rest/cJSON/LICENSE
+----------------------------------------------------------------------------------------------------------------------------------
+Copyright (c) 2009-2017 Dave Gamble and cJSON contributors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb/examples/rest/cJSON/LICENSE
+---------------------------------------------------------------------------------------------------------------
+Copyright (c) 2009-2017 Dave Gamble and cJSON contributors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/3rd_party/include/opentracing/catch2/LICENSE.txt
+-------------------------------------------------------------------------------------------------------------------
+Boost Software License - Version 1.0 - August 17th, 2003
+
+Permission is hereby granted, free of charge, to any person or organization
+obtaining a copy of the software and accompanying documentation covered by
+this license (the "Software") to use, reproduce, display, distribute,
+execute, and transmit the Software, and to prepare derivative works of the
+Software, and to permit third-parties to whom the Software is furnished to
+do so, all subject to the following:
+
+The copyright notices in the Software and this entire statement, including
+the above license grant, this restriction and the following disclaimer,
+must be included in all copies of the Software, in whole or in part, and
+all derivative works of the Software, unless such copies or derivative
+works are solely in the form of machine-executable object code generated by
+a source language processor.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+DEALINGS IN THE SOFTWARE.
+
+
+/pytorch/third_party/cpuinfo/deps/clog/LICENSE
+----------------------------------------------
+Copyright (C) 2018 Marat Dukhan
+Copyright (c) 2017-2018 Facebook Inc.
+Copyright (c) 2017 Georgia Institute of Technology
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/fbgemm/external/cpuinfo/deps/clog/LICENSE
+--------------------------------------------------------------
+Copyright (C) 2018 Marat Dukhan
+Copyright (c) 2017-2018 Facebook Inc.
+Copyright (c) 2017 Georgia Institute of Technology
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/testing/python3/libs_3rdparty/colorama/LICENSE.txt
+-----------------------------------------------------------------------------------------------------------------------------
+Copyright (c) 2010 Jonathan Hartley <tartley@tartley.com>
+
+Released under the New BSD license (reproduced below), or alternatively you may
+use this software under any OSI approved open source license such as those at
+http://opensource.org/licenses/alphabetical
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+* Neither the name(s) of the copyright holders, nor those of its contributors
+  may be used to endorse or promote products derived from this software without
+  specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+
+/pytorch/third_party/aiter/3rdparty/composable_kernel/LICENSE
+-------------------------------------------------------------
+Copyright (c) 2018-    , Advanced Micro Devices, Inc. (Chao Liu, Jing Zhang)
+Copyright (c) 2019-    , Advanced Micro Devices, Inc. (Letao Qin, Qianfeng Zhang, Liang Huang, Shaojie Wang)
+Copyright (c) 2022-    , Advanced Micro Devices, Inc. (Anthony Chang, Chunyu Lai, Illia Silin, Adam Osewski, Poyen Chen, Jehandad Khan)
+Copyright (c) 2019-2021, Advanced Micro Devices, Inc. (Hanwen Chang)
+Copyright (c) 2019-2020, Advanced Micro Devices, Inc. (Tejash Shah)
+Copyright (c) 2020     , Advanced Micro Devices, Inc. (Xiaoyan Zhou)
+Copyright (c) 2021-2022, Advanced Micro Devices, Inc. (Jianfeng Yan)
+
+SPDX-License-Identifier: MIT
+Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/composable_kernel/LICENSE
+----------------------------------------------
+Copyright (c) 2018-    , Advanced Micro Devices, Inc. (Chao Liu, Jing Zhang)
+Copyright (c) 2019-    , Advanced Micro Devices, Inc. (Letao Qin, Qianfeng Zhang, Liang Huang, Shaojie Wang)
+Copyright (c) 2022-    , Advanced Micro Devices, Inc. (Anthony Chang, Chunyu Lai, Illia Silin, Adam Osewski, Poyen Chen, Jehandad Khan)
+Copyright (c) 2019-2021, Advanced Micro Devices, Inc. (Hanwen Chang)
+Copyright (c) 2019-2020, Advanced Micro Devices, Inc. (Tejash Shah)
+Copyright (c) 2020     , Advanced Micro Devices, Inc. (Xiaoyan Zhou)
+Copyright (c) 2021-2022, Advanced Micro Devices, Inc. (Jianfeng Yan)
+
+SPDX-License-Identifier: MIT
+Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/fbgemm/external/composable_kernel/LICENSE
+--------------------------------------------------------------
+Copyright (c) 2018-    , Advanced Micro Devices, Inc. (Chao Liu, Jing Zhang)
+Copyright (c) 2019-    , Advanced Micro Devices, Inc. (Letao Qin, Qianfeng Zhang, Liang Huang, Shaojie Wang)
+Copyright (c) 2022-    , Advanced Micro Devices, Inc. (Anthony Chang, Chunyu Lai, Illia Silin, Adam Osewski, Poyen Chen, Jehandad Khan)
+Copyright (c) 2019-2021, Advanced Micro Devices, Inc. (Hanwen Chang)
+Copyright (c) 2019-2020, Advanced Micro Devices, Inc. (Tejash Shah)
+Copyright (c) 2020     , Advanced Micro Devices, Inc. (Xiaoyan Zhou)
+Copyright (c) 2021-2022, Advanced Micro Devices, Inc. (Jianfeng Yan)
+
+SPDX-License-Identifier: MIT
+Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/flash-attention/csrc/composable_kernel/LICENSE
+-------------------------------------------------------------------
+Copyright (c) 2018-    , Advanced Micro Devices, Inc. (Chao Liu, Jing Zhang)
+Copyright (c) 2019-    , Advanced Micro Devices, Inc. (Letao Qin, Qianfeng Zhang, Liang Huang, Shaojie Wang)
+Copyright (c) 2022-    , Advanced Micro Devices, Inc. (Anthony Chang, Chunyu Lai, Illia Silin, Adam Osewski, Poyen Chen, Jehandad Khan)
+Copyright (c) 2019-2021, Advanced Micro Devices, Inc. (Hanwen Chang)
+Copyright (c) 2019-2020, Advanced Micro Devices, Inc. (Tejash Shah)
+Copyright (c) 2020     , Advanced Micro Devices, Inc. (Xiaoyan Zhou)
+Copyright (c) 2021-2022, Advanced Micro Devices, Inc. (Jianfeng Yan)
+
+SPDX-License-Identifier: MIT
+Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/cpp-httplib/LICENSE
+----------------------------------------
+The MIT License (MIT)
+
+Copyright (c) 2017 yhirose
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json/third_party/cpplint/LICENSE
+------------------------------------------------------------------------------------------------------
+cpplint.py and its corresponding unit tests are Copyright (C) 2009 Google Inc.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+   * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+   * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+   * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr/LICENSE
+---------------------------------------------------------------------------------
+This license applies to everything except the contents of the "test"
+directory and its subdirectories.
+
+MIT License
+
+Copyright (c) 2017-2021 Huu Nguyen
+Copyright (c) 2022 libcpr and many other contributors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+/pytorch/third_party/cpuinfo/LICENSE
+------------------------------------
+Copyright (c) 2019 Google LLC
+Copyright (c) 2017-2018 Facebook Inc.
+Copyright (C) 2012-2017 Georgia Institute of Technology
+Copyright (C) 2010-2012 Marat Dukhan
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/fbgemm/external/cpuinfo/LICENSE
+----------------------------------------------------
+Copyright (c) 2019 Google LLC
+Copyright (c) 2017-2018 Facebook Inc.
+Copyright (C) 2012-2017 Georgia Institute of Technology
+Copyright (C) 2010-2012 Marat Dukhan
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/cudnn_frontend/LICENSE.txt
+-----------------------------------------------
+/*
+ * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */ 
+
+
+/pytorch/third_party/cutlass/LICENSE.txt
+----------------------------------------
+Copyright (c) 2017 - 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: BSD-3-Clause
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+this list of conditions and the following disclaimer in the documentation
+and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Certain files within this repository are subject to separate licensing terms:
+
+- The files located in the `python/CuTeDSL` directory are licensed under the
+  NVIDIA End User License Agreement (EULA). Please refer to
+  https://docs.nvidia.com/cutlass/media/docs/pythonDSL/license.html
+  for the full terms.
+
+
+/pytorch/third_party/fbgemm/external/cutlass/LICENSE.txt
+--------------------------------------------------------
+Copyright (c) 2017 - 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: BSD-3-Clause
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+this list of conditions and the following disclaimer in the documentation
+and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Certain files within this repository are subject to separate licensing terms:
+
+- The files located in the `python/CuTeDSL` directory are licensed under the
+  NVIDIA End User License Agreement (EULA). Please refer to
+  https://docs.nvidia.com/cutlass/media/docs/pythonDSL/license.html
+  for the full terms.
+
+
+/pytorch/third_party/flash-attention/csrc/cutlass/LICENSE.txt
+-------------------------------------------------------------
+Copyright (c) 2017 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: BSD-3-Clause
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+this list of conditions and the following disclaimer in the documentation
+and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/flatbuffers/dart/LICENSE
+---------------------------------------------
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright 2014 Google Inc.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/NVTX/docs/LICENSE.txt
+------------------------------------------
+==============================================================================
+NVTX is under the Apache License v2.0 with LLVM Exceptions:
+==============================================================================
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+    1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+    2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+    3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+    4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+    5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+    6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+    7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+    8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+    9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+    END OF TERMS AND CONDITIONS
+
+    APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+    Copyright [yyyy] [name of copyright owner]
+
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use this file except in compliance with the License.
+    You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+
+
+---- LLVM Exceptions to the Apache 2.0 License ----
+
+As an exception, if, as a result of your compiling your source code, portions
+of this Software are embedded into an Object form of such source code, you
+may redistribute such embedded portions in such Object form without complying
+with the conditions of Sections 4(a), 4(b) and 4(d) of the License.
+
+In addition, if you combine or link compiled forms of this Software with
+software that is licensed under the GPLv2 ("Combined Software") and if a
+court of competent jurisdiction determines that the patent provision (Section
+3), the indemnity provision (Section 9) or other Section of the License
+conflicts with the conditions of the GPLv2, you may retroactively and
+prospectively choose to deem waived or otherwise exclude such Section(s) of
+the License, but only in their entirety and only with respect to the Combined
+Software.
+
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json/test/thirdparty/doctest/LICENSE.txt
+--------------------------------------------------------------------------------------------------------------
+The MIT License (MIT)
+
+Copyright (c) 2016-2021 Viktor Kirilov
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.5.2/LICENSE.txt
+------------------------------------------------------------------------------------------------------------------------------------------------
+===============
+Duktape license
+===============
+
+(http://opensource.org/licenses/MIT)
+
+Copyright (c) 2013-2016 by Duktape authors (see AUTHORS.rst)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.5.2/LICENSE.txt
+-----------------------------------------------------------------------------------------------------------------------------
+===============
+Duktape license
+===============
+
+(http://opensource.org/licenses/MIT)
+
+Copyright (c) 2013-2016 by Duktape authors (see AUTHORS.rst)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.8.0/LICENSE.txt
+------------------------------------------------------------------------------------------------------------------------------------------------
+===============
+Duktape license
+===============
+
+(http://opensource.org/licenses/MIT)
+
+Copyright (c) 2013-2017 by Duktape authors (see AUTHORS.rst)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb/src/third_party/duktape-1.8.0/LICENSE.txt
+-----------------------------------------------------------------------------------------------------------------------------
+===============
+Duktape license
+===============
+
+(http://opensource.org/licenses/MIT)
+
+Copyright (c) 2013-2017 by Duktape authors (see AUTHORS.rst)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/LICENSE
+-----------------------------------------------------------------
+MIT License
+
+Copyright (c) Facebook, Inc. and its affiliates.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/opentelemetry-cpp/exporters/etw/include/opentelemetry/exporters/etw/LICENSE
+------------------------------------------------------------------------------------------------
+TraceLogging Dynamic for Windows
+
+Copyright (c) Microsoft Corporation. All rights reserved.
+
+MIT License
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/3rd_party/include/opentracing/expected/LICENSE
+-----------------------------------------------------------------------------------------------------------------
+The MIT License (MIT)
+
+Copyright (c) 2015 Martin Moene
+Copyright (c) 2015 Microsoft Corporation. All rights reserved. 
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+
+/pytorch/third_party/fbgemm/LICENSE
+-----------------------------------
+BSD License
+
+For FBGEMM software
+
+Copyright (c) Meta Platforms, Inc. and affiliates. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+ * Neither the name Facebook nor the names of its contributors may be used to
+   endorse or promote products derived from this software without specific
+   prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/ffnvcodec/LICENSE.txt
+------------------------------------------------------------------------------
+GNU LESSER GENERAL PUBLIC LICENSE
+Version 2.1, February 1999
+
+Copyright (C) 1991, 1999 Free Software Foundation, Inc.
+51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+Everyone is permitted to copy and distribute verbatim copies
+of this license document, but changing it is not allowed.
+
+[This is the first released version of the Lesser GPL.  It also counts
+ as the successor of the GNU Library Public License, version 2, hence
+ the version number 2.1.]
+Preamble
+The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users.
+
+This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below.
+
+When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things.
+
+To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it.
+
+For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights.
+
+We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library.
+
+To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others.
+
+Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license.
+
+Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs.
+
+When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library.
+
+We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances.
+
+For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License.
+
+In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system.
+
+Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library.
+
+The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run.
+
+TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you".
+
+A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables.
+
+The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".)
+
+"Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library.
+
+Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does.
+
+1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library.
+
+You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.
+
+2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:
+
+a) The modified work must itself be a software library.
+b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change.
+c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License.
+d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful.
+(For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.)
+
+These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library.
+
+In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.
+
+3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices.
+
+Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy.
+
+This option is useful when you wish to copy part of the code of the Library into a program that is not a library.
+
+4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange.
+
+If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code.
+
+5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License.
+
+However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables.
+
+When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law.
+
+If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.)
+
+Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself.
+
+6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications.
+
+You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things:
+
+a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.)
+b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with.
+c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution.
+d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place.
+e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy.
+For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.
+
+It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute.
+
+7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things:
+
+a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above.
+b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work.
+8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
+
+9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it.
+
+10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License.
+
+11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library.
+
+If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances.
+
+It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice.
+
+This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License.
+
+12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License.
+
+13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation.
+
+14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally.
+
+NO WARRANTY
+
+15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+
+END OF TERMS AND CONDITIONS
+How to Apply These Terms to Your New Libraries
+If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License).
+
+To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.
+
+one line to give the library's name and an idea of what it does.
+Copyright (C) year  name of author
+
+This library is free software; you can redistribute it and/or
+modify it under the terms of the GNU Lesser General Public
+License as published by the Free Software Foundation; either
+version 2.1 of the License, or (at your option) any later version.
+
+This library is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public
+License along with this library; if not, write to the Free Software
+Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
+Also add information on how to contact you by electronic and paper mail.
+
+You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names:
+
+Yoyodyne, Inc., hereby disclaims all copyright interest in
+the library `Frob' (a library for tweaking knobs) written
+by James Random Hacker.
+
+signature of Ty Coon, 1 April 1990
+Ty Coon, President of Vice
+That's all there is to it!
+
+/pytorch/third_party/flash-attention/LICENSE
+--------------------------------------------
+BSD 3-Clause License
+
+Copyright (c) 2022, the respective contributors, as shown by the AUTHORS file.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+* Neither the name of the copyright holder nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/flatbuffers/LICENSE
+----------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/fmt/LICENSE
+--------------------------------
+Copyright (c) 2012 - present, Victor Zverovich and {fmt} contributors
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+--- Optional exception to the license ---
+
+As an exception, if, as a result of your compiling your source code, portions
+of this Software are embedded into a machine-executable object form of such
+source code, you may redistribute such embedded portions in such object form
+without including the above copyright and permission notices.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt/LICENSE.rst
+-------------------------------------------------------------------------------------
+Copyright (c) 2012 - present, Victor Zverovich
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+--- Optional exception to the license ---
+
+As an exception, if, as a result of your compiling your source code, portions
+of this Software are embedded into a machine-executable object form of such
+source code, you may redistribute such embedded portions in such object form
+without including the above copyright and permission notices.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/fmt/LICENSE
+-------------------------------------------------------------
+Copyright (c) 2012 - present, Victor Zverovich and {fmt} contributors
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+--- Optional exception to the license ---
+
+As an exception, if, as a result of your compiling your source code, portions
+of this Software are embedded into a machine-executable object form of such
+source code, you may redistribute such embedded portions in such object form
+without including the above copyright and permission notices.
+
+
+/pytorch/third_party/gemmlowp/gemmlowp/LICENSE
+----------------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest/googlemock/scripts/generator/LICENSE
+---------------------------------------------------------------------------------------------------------------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [2007] Neal Norwitz
+   Portions Copyright [2007] Google Inc.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest/googlemock/scripts/generator/LICENSE
+--------------------------------------------------------------------------------------------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [2007] Neal Norwitz
+   Portions Copyright [2007] Google Inc.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/protobuf/third_party/googletest/googlemock/scripts/generator/LICENSE
+-----------------------------------------------------------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [2007] Neal Norwitz
+   Portions Copyright [2007] Google Inc.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/tensorpipe/third_party/googletest/googlemock/scripts/generator/LICENSE
+-------------------------------------------------------------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [2007] Neal Norwitz
+   Portions Copyright [2007] Google Inc.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/gettimeofday/LICENSE
+-----------------------------------------------------------------------------
+/*
+ * Copied from PostgreSQL source:
+ *  http://doxygen.postgresql.org/gettimeofday_8c_source.html
+ *
+ */
+
+/*
+ * gettimeofday.c
+ *	  Win32 gettimeofday() replacement
+ *
+ * src/port/gettimeofday.c
+ *
+ * Copyright (c) 2003 SRA, Inc.
+ * Copyright (c) 2003 SKC, Inc.
+ *
+ * Permission to use, copy, modify, and distribute this software and
+ * its documentation for any purpose, without fee, and without a
+ * written agreement is hereby granted, provided that the above
+ * copyright notice and this paragraph and the following two
+ * paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
+ * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
+ * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
+ * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+ */
+
+
+/pytorch/third_party/gloo/LICENSE
+---------------------------------
+BSD License
+
+For Gloo software
+
+Copyright (c) 2017-present, Facebook, Inc. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+ * Neither the name Facebook nor the names of its contributors may be used to
+   endorse or promote products derived from this software without specific
+   prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/protobuf/third_party/googletest/googlemock/LICENSE
+-----------------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/tensorpipe/third_party/googletest/googlemock/LICENSE
+-------------------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/fbgemm/external/googletest/LICENSE
+-------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/googletest/LICENSE
+---------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest/LICENSE
+----------------------------------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest/LICENSE
+----------------------------------------------------------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/googletest/LICENSE
+--------------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/googletest/LICENSE
+---------------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest/LICENSE
+---------------------------------------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/protobuf/third_party/googletest/LICENSE
+------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/protobuf/third_party/googletest/googletest/LICENSE
+-----------------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/tensorpipe/third_party/googletest/LICENSE
+--------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/tensorpipe/third_party/googletest/googletest/LICENSE
+-------------------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/ideep/mkl-dnn/tests/gtests/gtest/LICENSE
+-------------------------------------------------------------
+Copyright 2008, Google Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/fbgemm/external/hipify_torch/LICENSE.txt
+-------------------------------------------------------------
+MIT License
+
+Copyright (c) 2021-2024, Advanced Micro Devices, Inc. All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/fbgemm/fbgemm_gpu/experimental/hstu/LICENSE
+----------------------------------------------------------------
+BSD 3-Clause License
+
+Copyright (c) 2022, the respective contributors, as shown by the AUTHORS file.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+* Neither the name of the copyright holder nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+/*
+ * SPDX-FileCopyrightText: Copyright (c) <2024> NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-License-Identifier: LicenseRef-NvidiaProprietary
+ *
+ * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
+ * property and proprietary rights in and to this material, related
+ * documentation and any modifications thereto. Any use, reproduction,
+ * disclosure or distribution of this material and related documentation
+ * without an express license agreement from NVIDIA CORPORATION or
+ * its affiliates is strictly prohibited.
+ */
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/hungarian/LICENSE.txt
+------------------------------------------------------------------------------
+/********************************************************************
+ ********************************************************************
+ **
+ ** libhungarian by Cyrill Stachniss, 2004
+ **
+ **
+ ** Solving the Minimum Assignment Problem using the
+ ** Hungarian Method.
+ **
+ ** ** This file may be freely copied and distributed! **
+ **
+ ** Parts of the used code was originally provided by the
+ ** "Stanford GraphGase", but I made changes to this code.
+ ** As asked by  the copyright node of the "Stanford GraphGase",
+ ** I hereby proclaim that this file are *NOT* part of the
+ ** "Stanford GraphGase" distrubition!
+ **
+ ** This file is distributed in the hope that it will be useful,
+ ** but WITHOUT ANY WARRANTY; without even the implied
+ ** warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+ ** PURPOSE.
+ **
+ ********************************************************************
+ ********************************************************************/
+
+
+/pytorch/third_party/ideep/LICENSE
+----------------------------------
+Copyright (c) 2018 Intel Corporation.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/irrlicht/LICENSE.txt
+-----------------------------------------------------------------------------
+The Irrlicht Engine License
+===========================
+
+Copyright (C) 2002-2015 Nikolaus Gebhardt
+
+This software is provided 'as-is', without any express or implied
+warranty.  In no event will the authors be held liable for any damages
+arising from the use of this software.
+
+Permission is granted to anyone to use this software for any purpose,
+including commercial applications, and to alter it and redistribute it
+freely, subject to the following restrictions:
+
+1. The origin of this software must not be misrepresented; you must not
+   claim that you wrote the original software. If you use this software
+   in a product, an acknowledgement in the product documentation would be
+   appreciated but is not required.
+2. Altered source versions must be clearly marked as such, and must not be
+   misrepresented as being the original software.
+3. This notice may not be removed or altered from any source distribution.
+
+/pytorch/third_party/kineto/LICENSE
+-----------------------------------
+BSD License
+
+For Kineto software
+
+Copyright (c) Meta Platforms, Inc. and affiliates.
+
+All contributions by Microsoft:
+Copyright (c) Microsoft Corporation. (The Azure AI Platform team)
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+ * Neither the name Meta nor the names of its contributors may be used to
+   endorse or promote products derived from this software without specific
+   prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/tensorpipe/third_party/libnop/LICENSE
+----------------------------------------------------------
+Copyright 2017 The Native Object Protocols Authors
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    https://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/libstemmer/LICENSE
+---------------------------------------------------------------------------
+Snowball - License
+Except where explicitly noted, all the software given out on this Snowball site is covered by the 3-clause BSD License:
+
+Copyright (c) 2001, Dr Martin Porter,
+Copyright (c) 2002, Richard Boulton.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+Essentially, all this means is that you can do what you like with the code, except claim another Copyright for it, or claim that it is issued under a different license. The software is also issued without warranties, which means that if anyone suffers through its use, they cannot come back and sue you. You also have to alert anyone to whom you give the Snowball software to the fact that it is covered by the BSD license.
+
+We have not bothered to insert the licensing arrangement into the text of the Snowball software.
+
+
+/pytorch/third_party/tensorpipe/third_party/libuv/LICENSE
+---------------------------------------------------------
+Copyright (c) 2015-present libuv project contributors.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to
+deal in the Software without restriction, including without limitation the
+rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+sell copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+IN THE SOFTWARE.
+
+
+/pytorch/third_party/mimalloc/LICENSE
+-------------------------------------
+MIT License
+
+Copyright (c) 2018-2025 Microsoft Corporation, Daan Leijen
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/miniz-3.0.2/LICENSE
+----------------------------------------
+Copyright 2013-2014 RAD Game Tools and Valve Software
+Copyright 2010-2014 Rich Geldreich and Tenacious Software LLC
+
+All Rights Reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+/pytorch/third_party/ideep/mkl-dnn/LICENSE
+------------------------------------------
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   ============================================================================
+
+   Copyright 2016-2023 Intel Corporation
+   Copyright 2018 YANDEX LLC
+   Copyright 2019-2023 FUJITSU LIMITED
+   Copyright 2020-2023 Arm Ltd. and affiliates
+   Copyright 2020-2022 Codeplay Software Limited
+   Copyright 2021 Alanna Tempest
+   Copyright 2022-2023 IBM Corporation
+   Copyright 2023 KNS Group LLC (YADRO)
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+   This distribution includes third party software ("third party programs").
+   This third party software, even if included with the distribution of
+   the Intel software, may be governed by separate license terms, including
+   without limitation, third party license terms, other Intel software license
+   terms, and open source software license terms. These separate license terms
+   govern your use of the third party programs as set forth in the
+   "THIRD-PARTY-PROGRAMS" file.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl/LICENSE
+-----------------------------------------------------------------
+Copyright (c) 2015 Microsoft Corporation. All rights reserved. 
+ 
+This code is licensed under the MIT License (MIT). 
+
+Permission is hereby granted, free of charge, to any person obtaining a copy 
+of this software and associated documentation files (the "Software"), to deal 
+in the Software without restriction, including without limitation the rights 
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 
+of the Software, and to permit persons to whom the Software is furnished to do 
+so, subject to the following conditions: 
+
+The above copyright notice and this permission notice shall be included in all 
+copies or substantial portions of the Software. 
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
+THE SOFTWARE. 
+
+
+/pytorch/third_party/fbgemm/fbgemm_gpu/src/quantize_ops/mx/LICENSE
+------------------------------------------------------------------
+    MIT License
+
+    Copyright (c) Microsoft Corporation.
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in all
+    copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+    SOFTWARE
+
+
+/pytorch/third_party/fbgemm/fbgemm_gpu/test/quantize/mx/LICENSE
+---------------------------------------------------------------
+    MIT License
+
+    Copyright (c) Microsoft Corporation.
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in all
+    copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+    SOFTWARE
+
+
+/pytorch/third_party/onnx/LICENSE
+---------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/opentelemetry-cpp/LICENSE
+----------------------------------------------
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto/LICENSE
+------------------------------------------------------------------------------
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/LICENSE
+--------------------------------------------------------------------------
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "{}"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright The OpenTracing Authors
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/pdcurses/LICENSE
+-------------------------------------------------------------------------
+The core package is in the public domain, but small portions of PDCurses are subject to copyright under various licenses.
+
+The win32 files are released to the public domain.
+
+If you use PDCurses in an application, an acknowledgement would be appreciated, but is not mandatory. If you make corrections or enhancements to PDCurses, please forward them to the current maintainer for the benefit of other users.
+
+This software is provided AS IS with NO WARRANTY whatsoever.
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs/LICENSE
+---------------------------------------------------------------------------------
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   Copyright 2020-present Daniel Trugman
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/physac/LICENSE
+-----------------------------------------------------------------------
+MIT License
+
+Copyright (c) 2022 Víctor Fisac
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/pqp/LICENSE
+--------------------------------------------------------------------
+Copyright 1999 University of North Carolina at Chapel Hill.
+All rights reserved.
+
+Permission to use, copy, modify, and distribute this software and its
+documentation for educational, research, and non-profit purposes, without fee,
+and without a written agreement is hereby granted, provided that the above
+copyright notice and the following three paragraphs appear in all copies.
+
+IN NO EVENT SHALL THE UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL BE LIABLE TO
+ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES,
+INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+DOCUMENTATION, EVEN IF THE UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL HAS
+BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+
+THE UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL SPECIFICALLY DISCLAIMS ANY
+WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED
+HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF NORTH CAROLINA AT
+CHAPEL HILL HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES,
+ENHANCEMENTS, OR MODIFICATIONS.
+
+The authors may be contacted via:  
+
+US Mail:             Eric Larsen, Stefan Gottschalk
+                      Department of Computer Science
+                      Sitterson Hall, CB #3175
+                      University of North Carolina
+                      Chapel Hill, NC 27599-3175  
+
+Phone:               (919) 962-1749 
+
+Email:               geom@cs.unc.edu
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/LICENSE
+--------------------------------------------------------------------------------------------
+MIT License
+
+Copyright (c) 2016-2021 Jupp Mueller
+Copyright (c) 2017-2022 Gregor Jasny
+
+And many contributors, see
+https://github.com/jupp0r/prometheus-cpp/graphs/contributors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/LICENSE
+-------------------------------------------------------------------------
+MIT License
+
+Copyright (c) 2016-2021 Jupp Mueller
+Copyright (c) 2017-2022 Gregor Jasny
+
+And many contributors, see
+https://github.com/jupp0r/prometheus-cpp/graphs/contributors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/protobuf/LICENSE
+-------------------------------------
+Copyright 2008 Google Inc.  All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+    * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Code generated by the Protocol Buffer compiler is owned by the owner
+of the input file used when generating it.  This code is not
+standalone and requires a support library to be linked with it.  This
+support library is itself covered by the above license.
+
+
+/pytorch/third_party/psimd/LICENSE
+----------------------------------
+The MIT License (MIT)
+
+Copyright (c) 2017 Facebook Inc.
+Copyright (c) 2014-2017 Georgia Institute of Technology
+Copyright 2019 Google LLC
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+
+/pytorch/third_party/pthreadpool/LICENSE
+----------------------------------------
+Copyright 2019 Google LLC
+Copyright (c) 2017 Facebook Inc.
+Copyright (c) 2015-2017 Georgia Institute of Technology
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+
+/pytorch/third_party/onnx/third_party/pybind11/LICENSE
+------------------------------------------------------
+Copyright (c) 2016 Wenzel Jakob <wenzel.jakob@epfl.ch>, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors
+   may be used to endorse or promote products derived from this software
+   without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Please also refer to the file .github/CONTRIBUTING.md, which clarifies licensing of
+external contributions to this project including patches, pull requests, etc.
+
+
+/pytorch/third_party/pybind11/LICENSE
+-------------------------------------
+Copyright (c) 2016 Wenzel Jakob <wenzel.jakob@epfl.ch>, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors
+   may be used to endorse or promote products derived from this software
+   without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Please also refer to the file .github/CONTRIBUTING.md, which clarifies licensing of
+external contributions to this project including patches, pull requests, etc.
+
+
+/pytorch/third_party/tensorpipe/third_party/pybind11/LICENSE
+------------------------------------------------------------
+Copyright (c) 2016 Wenzel Jakob <wenzel.jakob@epfl.ch>, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors
+   may be used to endorse or promote products derived from this software
+   without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Please also refer to the file CONTRIBUTING.md, which clarifies licensing of
+external contributions to this project including patches, pull requests, etc.
+
+
+/pytorch/third_party/NVTX/python/LICENSE.txt
+--------------------------------------------
+==============================================================================
+NVTX is under the Apache License v2.0 with LLVM Exceptions:
+==============================================================================
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+    1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+    2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+    3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+    4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+    5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+    6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+    7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+    8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+    9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+    END OF TERMS AND CONDITIONS
+
+    APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+    Copyright [yyyy] [name of copyright owner]
+
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use this file except in compliance with the License.
+    You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+
+
+---- LLVM Exceptions to the Apache 2.0 License ----
+
+As an exception, if, as a result of your compiling your source code, portions
+of this Software are embedded into an Object form of such source code, you
+may redistribute such embedded portions in such Object form without complying
+with the conditions of Sections 4(a), 4(b) and 4(d) of the License.
+
+In addition, if you combine or link compiled forms of this Software with
+software that is licensed under the GPLv2 ("Combined Software") and if a
+court of competent jurisdiction determines that the patent provision (Section
+3), the indemnity provision (Section 9) or other Section of the License
+conflicts with the conditions of the GPLv2, you may retroactively and
+prospectively choose to deem waived or otherwise exclude such Section(s) of
+the License, but only in their entirety and only with respect to the Combined
+Software.
+
+
+
+/pytorch/third_party/cutlass/python/LICENSE.txt
+-----------------------------------------------
+Copyright (c) 2017 - 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: BSD-3-Clause
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+this list of conditions and the following disclaimer in the documentation
+and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/fbgemm/external/cutlass/python/LICENSE.txt
+---------------------------------------------------------------
+Copyright (c) 2017 - 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: BSD-3-Clause
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+this list of conditions and the following disclaimer in the documentation
+and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/flash-attention/csrc/cutlass/python/LICENSE.txt
+--------------------------------------------------------------------
+Copyright (c) 2017 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: BSD-3-Clause
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+this list of conditions and the following disclaimer in the documentation
+and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/python-peachpy/LICENSE.rst
+-----------------------------------------------
+==============================
+PeachPy license (2-clause BSD)
+==============================
+
+Copyright (c) 2017, Facebook Inc.
+Copyright (c) 2013-2017, Georgia Institute of Technology
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/sigslot/LICENSE
+------------------------------------------------------------------------
+License
+The sigslot library has been placed in the public domain. This means that you are free to use it however you like.
+
+The author takes no responsibility or liability of any kind for any use that you may make of this library.
+
+If you screw up, it's your fault.
+
+If the library screws up, you got it for free, so you should have tested it better - it's still your responsibility.
+
+/pytorch/third_party/sleef/LICENSE.txt
+--------------------------------------
+Boost Software License - Version 1.0 - August 17th, 2003
+
+Permission is hereby granted, free of charge, to any person or organization
+obtaining a copy of the software and accompanying documentation covered by
+this license (the "Software") to use, reproduce, display, distribute,
+execute, and transmit the Software, and to prepare derivative works of the
+Software, and to permit third-parties to whom the Software is furnished to
+do so, all subject to the following:
+
+The copyright notices in the Software and this entire statement, including
+the above license grant, this restriction and the following disclaimer,
+must be included in all copies of the Software, in whole or in part, and
+all derivative works of the Software, unless such copies or derivative
+works are solely in the form of machine-executable object code generated by
+a source language processor.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+DEALINGS IN THE SOFTWARE.
+
+
+/pytorch/third_party/flatbuffers/swift/LICENSE
+----------------------------------------------
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+/pytorch/third_party/kineto/tb_plugin/LICENSE
+---------------------------------------------
+BSD License
+
+For Kineto software
+
+Copyright (c) Facebook, Inc. and its affiliates. All rights reserved.
+
+All contributions by Microsoft:
+Copyright (c) Microsoft Corporation. (The Azure AI Platform team)
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+ * Neither the name Facebook nor the names of its contributors may be used to
+   endorse or promote products derived from this software without specific
+   prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/tensorflow-common/LICENSE.txt
+--------------------------------------------------------------------------------------
+Copyright (c) Microsoft Corporation
+
+All rights reserved. 
+
+MIT License
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+
+/pytorch/third_party/tensorpipe/LICENSE.txt
+-------------------------------------------
+BSD License
+
+For TensorPipe software
+
+Copyright (c) Meta Platforms, Inc. and affiliates. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+ * Neither the name Meta nor the names of its contributors may be used to
+   endorse or promote products derived from this software without specific
+   prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr/test/LICENSE
+--------------------------------------------------------------------------------------
+This license applies to everything inside this directory and all
+subdirectories.
+
+                     GNU GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.  We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors.  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights.  Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received.  You must make sure that they, too, receive
+or can get the source code.  And you must show them these terms so they
+know their rights.
+
+  Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+  For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software.  For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+  Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so.  This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software.  The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable.  Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products.  If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+  Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary.  To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Use with the GNU Affero General Public License.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+    <program>  Copyright (C) <year>  <name of author>
+    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<https://www.gnu.org/licenses/>.
+
+  The GNU General Public License does not permit incorporating your program
+into proprietary programs.  If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.  But first, please read
+<https://www.gnu.org/licenses/why-not-lgpl.html>.
+
+/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp/3rd_party/include/opentracing/variant/LICENSE
+----------------------------------------------------------------------------------------------------------------
+Copyright (c) MapBox
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+- Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+- Redistributions in binary form must reproduce the above copyright notice, this
+  list of conditions and the following disclaimer in the documentation and/or
+  other materials provided with the distribution.
+- Neither the name "MapBox" nor the names of its contributors may be
+  used to endorse or promote products derived from this software without
+  specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/LICENSE.txt
+--------------------------------------------------------------
+MIT License
+
+Copyright (c) Microsoft Corporation
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this
+software and associated documentation files (the "Software"), to deal in the Software
+without restriction, including without limitation the rights to use, copy, modify,
+merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be included in all copies
+or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
+INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
+PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
+CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
+OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+
+/pytorch/third_party/opentelemetry-cpp/tools/vcpkg/ports/vulkan/LICENSE.txt
+---------------------------------------------------------------------------
+/*
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+
+Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+
+2. Grant of Copyright License.
+
+Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+
+3. Grant of Patent License.
+
+Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
+
+4. Redistribution.
+
+You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+
+You must give any other recipients of the Work or Derivative Works a copy of this License; and
+You must cause any modified files to carry prominent notices stating that You changed the files; and
+You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
+If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+
+5. Submission of Contributions.
+
+Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+
+6. Trademarks.
+
+This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+
+7. Disclaimer of Warranty.
+
+Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
+
+8. Limitation of Liability.
+
+In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+
+9. Accepting Warranty or Additional Liability.
+
+While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+
+END OF TERMS AND CONDITIONS
+
+===============================================================================================================================================
+
+/Copyright (C) 2012 LunarG, Inc.
+//All rights reserved.
+//
+//Redistribution and use in source and binary forms, with or without
+//modification, are permitted provided that the following conditions
+//are met:
+//
+//    Redistributions of source code must retain the above copyright
+//    notice, this list of conditions and the following disclaimer.
+//
+//    Redistributions in binary form must reproduce the above
+//    copyright notice, this list of conditions and the following
+//    disclaimer in the documentation and/or other materials provided
+//    with the distribution.
+//
+//    Neither the name of LunarG Inc. nor the names of its
+//    contributors may be used to endorse or promote products derived
+//    from this software without specific prior written permission.
+//
+//THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+//"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+//LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+//FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+//COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+//INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+//BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+//LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+//CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+//LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+//ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+//POSSIBILITY OF SUCH DAMAGE.
+
+===============================================================================================================================================
+
+#=============================================================================
+# Copyright 2007-2009 Kitware, Inc.
+# Copyright 2007-2008 Miguel A. Figueroa-Villanueva <miguelf at ieee dot org>
+#
+# Distributed under the OSI-approved BSD License (the "License");
+# see accompanying file Copyright_cmake.txt for details.
+#
+# This software is distributed WITHOUT ANY WARRANTY; without even the
+# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+# See the License for more information.
+#=============================================================================
+# (To distributed this file outside of CMake, substitute the full
+#  License text for the above reference.)
+
+
+==============================================================================================================================================
+
+//
+// Copyright (C) 2015-2018 Google, Inc.
+// Copyright (C) <various other dates and companies>
+//
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+//
+//    Redistributions of source code must retain the above copyright
+//    notice, this list of conditions and the following disclaimer.
+//
+//    Redistributions in binary form must reproduce the above
+//    copyright notice, this list of conditions and the following
+//    disclaimer in the documentation and/or other materials provided
+//    with the distribution.
+//
+//    Neither the name of 3Dlabs Inc. Ltd. nor the names of its
+//    contributors may be used to endorse or promote products derived
+//    from this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+// COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+
+==========================================================================================================================================
+
+Note: This license has also been called the "New BSD License" or "Modified BSD License". See also the 2-clause BSD License.
+Copyright <YEAR> <COPYRIGHT HOLDER>
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+==========================================================================================================================================
+
+/*
+*  xxHash - Fast Hash algorithm
+*  Copyright (C) 2012-2016, Yann Collet
+*
+*  BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+*
+*  Redistribution and use in source and binary forms, with or without
+*  modification, are permitted provided that the following conditions are
+*  met:
+*
+*  * Redistributions of source code must retain the above copyright
+*  notice, this list of conditions and the following disclaimer.
+*  * Redistributions in binary form must reproduce the above
+*  copyright notice, this list of conditions and the following disclaimer
+*  in the documentation and/or other materials provided with the
+*  distribution.
+*
+*  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+*  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+*  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+*  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+*  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+*  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+*  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+*  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+*  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+*  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+*  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+*
+*  You can contact the author at :
+*  - xxHash homepage: http://www.xxhash.com
+*  - xxHash source repository : https://github.com/Cyan4973/xxHash
+*/
+
+
+===========================================================================================================================================
+
+# Copyright (C) 2018 Google, Inc.
+#
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# Redistributions in binary form must reproduce the above
+# copyright notice, this list of conditions and the following
+# disclaimer in the documentation and/or other materials provided
+# with the distribution.
+#
+# Neither the name of Google Inc. nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+# COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+==========================================================================================================================================
+
+/* A Bison parser, made by GNU Bison 3.0.4.  */
+
+/* Bison implementation for Yacc-like parsers in C
+Copyright (C) 1984, 1989-1990, 2000-2015 Free Software Foundation, Inc.
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+You should have received a copy of the GNU General Public License
+along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+/* As a special exception, you may create a larger work that contains
+part or all of the Bison parser skeleton and distribute that work
+under terms of your choice, so long as that work isn't itself a
+parser generator using the skeleton or a modified version thereof
+as a parser skeleton.  Alternatively, if you modify or redistribute
+the parser skeleton itself, you may (at your option) remove this
+special exception, which will cause the skeleton and the resulting
+Bison output files to be licensed under the GNU General Public
+License without this special exception.
+This special exception was added by the Free Software Foundation in
+version 2.2 of Bison.  */
+
+/* C LALR(1) parser skeleton written by Richard Stallman, by
+simplifying the original so-called "semantic" parser.  */
+
+/* All symbols defined below should begin with yy or YY, to avoid
+infringing on user name space.  This should be done even for local
+variables, as they might otherwise be expanded by user macros.
+There are some unavoidable exceptions within include files to
+define necessary library symbols; they are noted "INFRINGES ON
+USER NAME SPACE" below.  */
+
+==============================================================================================================================================
+
+copyright : [
+Copyright (c) 2017 The Khronos Group Inc.,
+,
+Permission is hereby granted, free of charge, to any person obtaining a copy,
+of this software and/or associated documentation files (the \Materials\"),",
+to deal in the Materials without restriction, including without limitation,
+the rights to use, copy, modify, merge, publish, distribute, sublicense,,
+and/or sell copies of the Materials, and to permit persons to whom the,
+Materials are furnished to do so, subject to the following conditions:,
+,
+The above copyright notice and this permission notice shall be included in,
+all copies or substantial portions of the Materials.,
+,
+MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS KHRONOS,
+STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS SPECIFICATIONS AND,
+HEADER INFORMATION ARE LOCATED AT https://www.khronos.org/registry/ ,
+,
+THE MATERIALS ARE PROVIDED \AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS",
+OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL,
+THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER,
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING,
+FROM,OUT OF OR IN CONNECTION WITH THE MATERIALS OR THE USE OR OTHER DEALINGS,
+IN THE MATERIALS.
+
+=============================================================================================================================================
+
+CMake - Cross Platform Makefile Generator
+Copyright 2000-2009 Kitware, Inc., Insight Software Consortium
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the names of Kitware, Inc., the Insight Software Consortium,
+nor the names of their contributors may be used to endorse or promote
+products derived from this software without specific prior written
+permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+------------------------------------------------------------------------------
+
+The above copyright and license notice applies to distributions of
+CMake in source and binary form.  Some source files contain additional
+notices of original copyright by their contributors; see each source
+for details.  Third-party software packages supplied with CMake under
+compatible licenses provide their own copyright notices documented in
+corresponding subdirectories.
+
+------------------------------------------------------------------------------
+
+CMake was initially developed by Kitware with the following sponsorship:
+
+* National Library of Medicine at the National Institutes of Health
+as part of the Insight Segmentation and Registration Toolkit (ITK).
+
+* US National Labs (Los Alamos, Livermore, Sandia) ASC Parallel
+Visualization Initiative.
+
+* National Alliance for Medical Image Computing (NAMIC) is funded by the
+National Institutes of Health through the NIH Roadmap for Medical Research,
+Grant U54 EB005149.
+
+* Kitware, Inc.
+
+========================================================================================================================================
+
+The authors of this software are Rob Pike and Ken Thompson.
+*              Copyright (c) 2002 by Lucent Technologies.
+* Permission to use, copy, modify, and distribute this software for any
+* purpose without fee is hereby granted, provided that this entire notice
+* is included in all copies of any software which is or includes a copy
+* or modification of this software and in all copies of the supporting
+* documentation for such software.
+* THIS SOFTWARE IS BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED
+* WARRANTY.  IN PARTICULAR, NEITHER THE AUTHORS NOR LUCENT TECHNOLOGIES MAKE ANY
+* REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY
+* OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.
+
+
+========================================================================================================================================
+
+Copyright (c) 2015-2018 Baldur Karlsson
+
+Copyright (c) 2014 Crytek
+
+Copyright (c) 1998-2018 Third party code and tools
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+=========================================================================================================================================
+
+/*
+Copyright (c) 2009 Dave Gamble
+Copyright (c) 2015-2016 The Khronos Group Inc.
+Copyright (c) 2015-2016 Valve Corporation
+Copyright (c) 2015-2016 LunarG, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+*/
+
+===========================================================================================================================================
+
+Copyright (c) 2005 - 2017 G-Truc Creation
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+
+
+==========================================================================================================================================
+
+/*
+The JsonCpp library's source code, including accompanying documentation,
+tests and demonstration applications, are licensed under the following
+conditions...
+The author (Baptiste Lepilleur) explicitly disclaims copyright in all
+jurisdictions which recognize such a disclaimer. In such jurisdictions,
+this software is released into the Public Domain.
+In jurisdictions which do not recognize Public Domain property (e.g. Germany as of
+2010), this software is Copyright (c) 2007-2010 by Baptiste Lepilleur, and is
+released under the terms of the MIT License (see below).
+In jurisdictions which recognize Public Domain property, the user of this
+software may choose to accept it either as 1) Public Domain, 2) under the
+conditions of the MIT License (see below), or 3) under the terms of dual
+Public Domain/MIT License conditions described here, as they choose.
+The MIT License is about as close to Public Domain as a license can get, and is
+described in clear, concise terms at:
+http://en.wikipedia.org/wiki/MIT_License
+
+The full text of the MIT License follows:
+
+Copyright (c) 2007-2010 Baptiste Lepilleur
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use, copy,
+modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+==========================================================================================================================================
+
+/**
+* `murmurhash.h' - murmurhash
+*
+* copyright (c) 2014 joseph werle <joseph.werle@gmail.com>
+* Copyright (c) 2015-2016 The Khronos Group Inc.
+* Copyright (c) 2015-2016 Valve Corporation
+* Copyright (c) 2015-2016 LunarG, Inc.
+*
+* Permission is hereby granted, free of charge, to any person obtaining a copy
+* of this software and/or associated documentation files (the "Materials"), to
+* deal in the Materials without restriction, including without limitation the
+* rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+* sell copies of the Materials, and to permit persons to whom the Materials are
+* furnished to do so, subject to the following conditions:
+*
+* The above copyright notice(s) and this permission notice shall be included in
+* all copies or substantial portions of the Materials.
+*
+* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+*
+* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MATERIALS OR THE
+* USE OR OTHER DEALINGS IN THE MATERIALS.
+*/
+
+=========================================================================================================================================
+
+Licenced as X11: http://www.kryogenix.org/code/browser/licence.html
+This basically means: do what you want with it.
+
+=========================================================================================================================================
+
+///////////////////////////////////////////////////////////////////////////////////
+/// OpenGL Mathematics (glm.g-truc.net)
+///
+/// Copyright (c) 2005 - 2014 G-Truc Creation (www.g-truc.net)
+/// Permission is hereby granted, free of charge, to any person obtaining a copy
+/// of this software and associated documentation files (the "Software"), to deal
+/// in the Software without restriction, including without limitation the rights
+/// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+/// copies of the Software, and to permit persons to whom the Software is
+/// furnished to do so, subject to the following conditions:
+/// 
+/// The above copyright notice and this permission notice shall be included in
+/// all copies or substantial portions of the Software.
+/// 
+/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+/// THE SOFTWARE.
+///
+/// @ref core
+/// @file glm/common.hpp
+/// @date 2013-12-24 / 2013-12-24
+/// @author Christophe Riccio
+///////////////////////////////////////////////////////////////////////////////////
+
+
+==========================================================================================================================================
+
+// LICENSE
+//
+// This software is in the public domain. Where that dedication is not
+// recognized, you are granted a perpetual, irrevocable license to copy,
+// distribute, and modify this file as you see fit.
+//
+
+==========================================================================================================================================
+
+Simple DirectMedia Layer
+Copyright (C) 1997-2018 Sam Lantinga <slouken@libsdl.org>
+
+This software is provided 'as-is', without any express or implied
+warranty.  In no event will the authors be held liable for any damages
+arising from the use of this software.
+
+Permission is granted to anyone to use this software for any purpose,
+including commercial applications, and to alter it and redistribute it
+freely, subject to the following restrictions:
+
+1. The origin of this software must not be misrepresented; you must not
+claim that you wrote the original software. If you use this software
+in a product, an acknowledgment in the product documentation would be
+appreciated but is not required.
+2. Altered source versions must be plainly marked as such, and must not be
+misrepresented as being the original software.
+3. This notice may not be removed or altered from any source distribution.
+
+=========================================================================================================================================
+
+/****************************************************************************\
+Copyright (c) 2002, NVIDIA Corporation.
+
+NVIDIA Corporation("NVIDIA") supplies this software to you in
+consideration of your agreement to the following terms, and your use,
+installation, modification or redistribution of this NVIDIA software
+constitutes acceptance of these terms.  If you do not agree with these
+terms, please do not use, install, modify or redistribute this NVIDIA
+software.
+
+In consideration of your agreement to abide by the following terms, and
+subject to these terms, NVIDIA grants you a personal, non-exclusive
+license, under NVIDIA's copyrights in this original NVIDIA software (the
+NVIDIA Software), to use, reproduce, modify and redistribute the
+NVIDIA Software, with or without modifications, in source and/or binary
+forms; provided that if you redistribute the NVIDIA Software, you must
+retain the copyright notice of NVIDIA, this notice and the following
+text and disclaimers in all such redistributions of the NVIDIA Software.
+Neither the name, trademarks, service marks nor logos of NVIDIA
+Corporation may be used to endorse or promote products derived from the
+NVIDIA Software without specific prior written permission from NVIDIA.
+Except as expressly stated in this notice, no other rights or licenses
+express or implied, are granted by NVIDIA herein, including but not
+limited to any patent rights that may be infringed by your derivative
+works or by other works in which the NVIDIA Software may be
+incorporated. No hardware is licensed hereunder.
+
+THE NVIDIA SOFTWARE IS BEING PROVIDED ON AN "AS IS" BASIS, WITHOUT
+WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED,
+INCLUDING WITHOUT LIMITATION, WARRANTIES OR CONDITIONS OF TITLE,
+NON-INFRINGEMENT, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
+ITS USE AND OPERATION EITHER ALONE OR IN COMBINATION WITH OTHER
+PRODUCTS.
+
+IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL, INDIRECT,
+INCIDENTAL, EXEMPLARY, CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+TO, LOST PROFITS; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) OR ARISING IN ANY WAY
+OUT OF THE USE, REPRODUCTION, MODIFICATION AND/OR DISTRIBUTION OF THE
+NVIDIA SOFTWARE, HOWEVER CAUSED AND WHETHER UNDER THEORY OF CONTRACT,
+TORT (INCLUDING NEGLIGENCE), STRICT LIABILITY OR OTHERWISE, EVEN IF
+NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+\****************************************************************************/
+
+==================================================================================================================================================
+
+This software is provided 'as-is', without any express or implied
+warranty.  In no event will the authors be held liable for any damages
+arising from the use of this software.
+
+Permission is granted to anyone to use this software for any purpose,
+including commercial applications, and to alter it and redistribute it
+freely, subject to the following restrictions:
+
+1. The origin of this software must not be misrepresented; you must not
+   claim that you wrote the original software. If you use this software
+   in a product, an acknowledgment in the product documentation would be
+   appreciated but is not required.
+2. Altered source versions must be plainly marked as such, and must not be
+   misrepresented as being the original software.
+3. This notice may not be removed or altered from any source distribution.
+
+
+==================================================================================================================================================
+
+GNU LESSER GENERAL PUBLIC LICENSE
+Version 3, 29 June 2007
+
+Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+
+Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
+
+This version of the GNU Lesser General Public License incorporates the terms and conditions of version 3 of the GNU General Public License, supplemented by the additional permissions listed below.
+
+0. Additional Definitions.
+
+As used herein, "this License" refers to version 3 of the GNU Lesser General Public License, and the "GNU GPL" refers to version 3 of the GNU General Public License.
+
+"The Library" refers to a covered work governed by this License, other than an Application or a Combined Work as defined below.
+
+An "Application" is any work that makes use of an interface provided by the Library, but which is not otherwise based on the Library. Defining a subclass of a class defined by the Library is deemed a mode of using an interface provided by the Library.
+
+A "Combined Work" is a work produced by combining or linking an Application with the Library. The particular version of the Library with which the Combined Work was made is also called the "Linked Version".
+
+The "Minimal Corresponding Source" for a Combined Work means the Corresponding Source for the Combined Work, excluding any source code for portions of the Combined Work that, considered in isolation, are based on the Application, and not on the Linked Version.
+
+The "Corresponding Application Code" for a Combined Work means the object code and/or source code for the Application, including any data and utility programs needed for reproducing the Combined Work from the Application, but excluding the System Libraries of the Combined Work.
+
+1. Exception to Section 3 of the GNU GPL.
+
+You may convey a covered work under sections 3 and 4 of this License without being bound by section 3 of the GNU GPL.
+
+2. Conveying Modified Versions.
+
+If you modify a copy of the Library, and, in your modifications, a facility refers to a function or data to be supplied by an Application that uses the facility (other than as an argument passed when the facility is invoked), then you may convey a copy of the modified version:
+
+a) under this License, provided that you make a good faith effort to ensure that, in the event an Application does not supply the function or data, the facility still operates, and performs whatever part of its purpose remains meaningful, or
+b) under the GNU GPL, with none of the additional permissions of this License applicable to that copy.
+3. Object Code Incorporating Material from Library Header Files.
+
+The object code form of an Application may incorporate material from a header file that is part of the Library. You may convey such object code under terms of your choice, provided that, if the incorporated material is not limited to numerical parameters, data structure layouts and accessors, or small macros, inline functions and templates (ten or fewer lines in length), you do both of the following:
+
+a) Give prominent notice with each copy of the object code that the Library is used in it and that the Library and its use are covered by this License.
+b) Accompany the object code with a copy of the GNU GPL and this license document.
+4. Combined Works.
+
+You may convey a Combined Work under terms of your choice that, taken together, effectively do not restrict modification of the portions of the Library contained in the Combined Work and reverse engineering for debugging such modifications, if you also do each of the following:
+
+a) Give prominent notice with each copy of the Combined Work that the Library is used in it and that the Library and its use are covered by this License.
+b) Accompany the Combined Work with a copy of the GNU GPL and this license document.
+c) For a Combined Work that displays copyright notices during execution, include the copyright notice for the Library among these notices, as well as a reference directing the user to the copies of the GNU GPL and this license document.
+d) Do one of the following:
+0) Convey the Minimal Corresponding Source under the terms of this License, and the Corresponding Application Code in a form suitable for, and under terms that permit, the user to recombine or relink the Application with a modified version of the Linked Version to produce a modified Combined Work, in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source.
+1) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (a) uses at run time a copy of the Library already present on the user's computer system, and (b) will operate properly with a modified version of the Library that is interface-compatible with the Linked Version.
+e) Provide Installation Information, but only if you would otherwise be required to provide such information under section 6 of the GNU GPL, and only to the extent that such information is necessary to install and execute a modified version of the Combined Work produced by recombining or relinking the Application with a modified version of the Linked Version. (If you use option 4d0, the Installation Information must accompany the Minimal Corresponding Source and Corresponding Application Code. If you use option 4d1, you must provide the Installation Information in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source.)
+5. Combined Libraries.
+
+You may place library facilities that are a work based on the Library side by side in a single library together with other library facilities that are not Applications and are not covered by this License, and convey such a combined library under terms of your choice, if you do both of the following:
+
+a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities, conveyed under the terms of this License.
+b) Give prominent notice with the combined library that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work.
+6. Revised Versions of the GNU Lesser General Public License.
+
+The Free Software Foundation may publish revised and/or new versions of the GNU Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Library as you received it specifies that a certain numbered version of the GNU Lesser General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that published version or of any later version published by the Free Software Foundation. If the Library as you received it does not specify a version number of the GNU Lesser General Public License, you may choose any version of the GNU Lesser General Public License ever published by the Free Software Foundation.
+
+If the Library as you received it specifies that a proxy can decide whether future versions of the GNU Lesser General Public License shall apply, that proxy's public statement of acceptance of any version is permanent authorization for you to choose that version for the Library.
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torch-2.10.0+cu128.dist-info/licenses/NOTICE b/URSA/.venv_ursa/lib/python3.12/site-packages/torch-2.10.0+cu128.dist-info/licenses/NOTICE
new file mode 100644
index 0000000000000000000000000000000000000000..6effb8b5d70709f90835f2f5d646352fd77b6943
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torch-2.10.0+cu128.dist-info/licenses/NOTICE
@@ -0,0 +1,456 @@
+=======================================================================
+Software under third_party
+=======================================================================
+Software libraries under third_party are provided as github submodule
+links, and their content is not part of the Caffe2 codebase. Their
+licences can be found under the respective software repositories.
+
+=======================================================================
+Earlier BSD License
+=======================================================================
+Early development of Caffe2 in 2015 and early 2016 is licensed under the
+BSD license. The license is attached below:
+
+All contributions by Facebook:
+Copyright (c) 2016 Facebook Inc.
+
+All contributions by Google:
+Copyright (c) 2015 Google Inc.
+All rights reserved.
+
+All contributions by Yangqing Jia:
+Copyright (c) 2015 Yangqing Jia
+All rights reserved.
+
+All contributions by Kakao Brain:
+Copyright 2019-2020 Kakao Brain
+
+All other contributions:
+Copyright(c) 2015, 2016 the respective contributors
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+=======================================================================
+Caffe's BSD License
+=======================================================================
+Some parts of the caffe2 code is derived from the original Caffe code, which is
+created by Yangqing Jia and is now a BSD-licensed open-source project. The Caffe
+license is as follows:
+
+COPYRIGHT
+
+All contributions by the University of California:
+Copyright (c) 2014, The Regents of the University of California (Regents)
+All rights reserved.
+
+All other contributions:
+Copyright (c) 2014, the respective contributors
+All rights reserved.
+
+Caffe uses a shared copyright model: each contributor holds copyright over
+their contributions to Caffe. The project versioning records all such
+contribution and copyright details. If a contributor wants to further mark
+their specific copyright on a particular contribution, they should indicate
+their copyright solely in the commit message of the change when it is
+committed.
+
+LICENSE
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+CONTRIBUTION AGREEMENT
+
+By contributing to the BVLC/caffe repository through pull-request, comment,
+or otherwise, the contributor releases their content to the
+license and copyright terms herein.
+
+=======================================================================
+Caffe2's Apache License
+=======================================================================
+
+This repo contains Caffe2 code, which was previously licensed under
+Apache License Version 2.0:
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+=======================================================================
+Cephes's 3-Clause BSD License
+=======================================================================
+
+Code derived from implementations in the Cephes Math Library should mention
+its derivation and reference the following license:
+
+   3-Clause BSD License for the Cephes Math Library
+   Copyright (c) 2018, Steven Moshier
+   All rights reserved.
+
+   Redistribution and use in source and binary forms, with or without
+   modification, are permitted provided that the following conditions are met:
+
+   * Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+
+   * Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+
+   * Neither the name of the nor the
+   names of its contributors may be used to endorse or promote products
+   derived from this software without specific prior written permission.
+
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+   ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+   WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+   DISCLAIMED. IN NO EVENT SHALL Steven Moshier BE LIABLE FOR ANY
+   DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+   (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+   LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+   ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+   SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+=======================================================================
+SciPy's 3-Clause BSD License
+=======================================================================
+
+Code derived from implementations in SciPy should mention its derivation
+and reference the following license:
+
+   Copyright (c) 2001-2002 Enthought, Inc.  2003-2019, SciPy Developers.
+   All rights reserved.
+
+   Redistribution and use in source and binary forms, with or without
+   modification, are permitted provided that the following conditions
+   are met:
+
+   1. Redistributions of source code must retain the above copyright
+     notice, this list of conditions and the following disclaimer.
+
+   2. Redistributions in binary form must reproduce the above
+     copyright notice, this list of conditions and the following
+     disclaimer in the documentation and/or other materials provided
+     with the distribution.
+
+   3. Neither the name of the copyright holder nor the names of its
+     contributors may be used to endorse or promote products derived
+     from this software without specific prior written permission.
+
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+=======================================================================
+Boost's 1.0 Software License
+=======================================================================
+
+Code derived from implementations in Boost 1.0 should mention its
+derivation and reference the following license:
+
+   Boost Software License - Version 1.0 - August 17th, 2003
+
+   Permission is hereby granted, free of charge, to any person or organization
+   obtaining a copy of the software and accompanying documentation covered by
+   this license (the "Software") to use, reproduce, display, distribute,
+   execute, and transmit the Software, and to prepare derivative works of the
+   Software, and to permit third-parties to whom the Software is furnished to
+   do so, all subject to the following:
+
+   The copyright notices in the Software and this entire statement, including
+   the above license grant, this restriction and the following disclaimer,
+   must be included in all copies of the Software, in whole or in part, and
+   all derivative works of the Software, unless such copies or derivative
+   works are solely in the form of machine-executable object code generated by
+   a source language processor.
+
+   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+   FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+   SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+   FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+   ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+   DEALINGS IN THE SOFTWARE.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+=======================================================================
+PILLOW-SIMD Software License
+=======================================================================
+
+Code derived from implementations in PILLOW-SIMD should mention its derivation
+and reference the following license:
+
+    The Python Imaging Library (PIL) is
+
+        Copyright © 1997-2011 by Secret Labs AB
+        Copyright © 1995-2011 by Fredrik Lundh
+
+    Pillow is the friendly PIL fork. It is
+
+        Copyright © 2010-2022 by Alex Clark and contributors
+
+    Like PIL, Pillow is licensed under the open source HPND License:
+
+    By obtaining, using, and/or copying this software and/or its associated
+    documentation, you agree that you have read, understood, and will comply
+    with the following terms and conditions:
+
+    Permission to use, copy, modify, and distribute this software and its
+    associated documentation for any purpose and without fee is hereby granted,
+    provided that the above copyright notice appears in all copies, and that
+    both that copyright notice and this permission notice appear in supporting
+    documentation, and that the name of Secret Labs AB or the author not be
+    used in advertising or publicity pertaining to distribution of the software
+    without specific, written prior permission.
+
+    SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
+    SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
+    IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL,
+    INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
+    LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
+    OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
+    PERFORMANCE OF THIS SOFTWARE.
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b6a8b3e709b276398230bb8b74e0eb780a528ac7
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/code_template.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/code_template.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5709ba10398d2043acaca693e1d2314f30df2c58
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/code_template.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/context.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/context.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3ed6596555d80c9544c8329b8460fbbe43f983c0
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/context.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_aoti_c_shim.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_aoti_c_shim.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6b09cf4830e39bf2b5fd2b9f80eae1de9589b208
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_aoti_c_shim.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_backend_stubs.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_backend_stubs.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5f436c07da4aacdfa72127a84f1d7fe71f8b9907
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_backend_stubs.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_functionalization_type.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_functionalization_type.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b883f4710b170cfb925323d08e9e52f906d53da1
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_functionalization_type.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_lazy_tensor.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_lazy_tensor.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..24bd8cb2a4830381d2b2d2056ed98668f6d761f8
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_lazy_tensor.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_schema_utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_schema_utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ee4056782b5443b2eaa0fe984fb028688972f6ac
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_schema_utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_vmap_plumbing.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_vmap_plumbing.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..acddad969d3bb15091e2f536ab0d0ba9895d425d
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/gen_vmap_plumbing.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/local.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/local.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e48168e68d865d856d0257c43718ee316dd2e170
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/local.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/native_function_generation.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/native_function_generation.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5c53f58bab7897989e610ef970fe75efdd3c6dab
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/native_function_generation.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6ebbaea14208f95053d7625fc647e725c57a5bbd
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/yaml_utils.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/yaml_utils.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..0dc66ce82b43416487ec943a788a4690a7101e3a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/__pycache__/yaml_utils.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..98d1960cd0aaaf3a0d9539cb4a865a301abd499e
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/__pycache__/fallback_ops.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/__pycache__/fallback_ops.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f2f21af36fdd69c07b0feb70d997c7ef448a67f7
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/__pycache__/fallback_ops.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/fallback_ops.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/fallback_ops.py
new file mode 100644
index 0000000000000000000000000000000000000000..f78cc85e22676edfa5ec90e5c6f204f5bfaea10a
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/aoti/fallback_ops.py
@@ -0,0 +1,194 @@
+# Be extra careful when you edit this file, because it affects AOTInductor ABI compatibility. See
+# https://github.com/pytorch/pytorch/blob/7e86a7c0155295539996e0cf422883571126073e/torchgen/gen.py#L2424-L2436
+# for details.
+#
+# The inductor_fallback_ops list is based on the fallback ops from torch/_inductor/lowering.py.
+#
+# Generally speaking, it is ok to add a new op to the list, but you need to run
+# `python torchgen/gen.py --update-aoti-c-shim` in order to regenerate C shim header files.
+# But it is NOT ok to remove an existing fallback op from the list, since that will break
+# some existing AOTInductor-compiled models.
+#
+# A fallback op version defaults to 1. If you want to extend an existing fallback op by adding
+# a new argument with a default value, while it is fine in the Python world, it will be BC-breaking
+# when generating C shim. Thus you need to bump up the version number of that fallback op by
+# updating the entry in the inductor_fallback_ops list, adding a new version number with a list
+# of new arguments, and then run `python torchgen/gen.py --update-aoti-c-shim` to regenerate.
+
+inductor_fallback_ops: dict[str, dict[str, list[str]]] = {
+    "aten._adaptive_avg_pool2d_backward.default": {},
+    "aten._adaptive_avg_pool2d.default": {},
+    "aten._adaptive_avg_pool3d_backward.default": {},
+    "aten._adaptive_avg_pool3d.default": {},
+    "aten._addmm_activation.default": {},
+    "aten._cdist_backward.default": {},
+    "aten._cdist_forward.default": {},
+    "aten._cudnn_rnn.default": {},
+    "aten._dyn_quant_matmul_4bit.default": {},
+    "aten._dyn_quant_pack_4bit_weight.default": {},
+    "aten._efficient_attention_backward.default": {},
+    "aten._efficient_attention_forward.default": {},
+    "aten._efficientzerotensor.default": {},
+    "aten._embedding_bag_dense_backward.default": {},
+    "aten._embedding_bag_forward_only.default": {},
+    "aten._embedding_bag_per_sample_weights_backward.default": {},
+    "aten._embedding_bag.default": {},
+    "aten._fft_c2c.default": {},
+    "aten._fft_r2c.default": {},
+    "aten._flash_attention_backward.default": {},
+    "aten._flash_attention_forward.default": {},
+    "aten._fused_moving_avg_obs_fq_helper_functional.default": {},
+    "aten._fused_moving_avg_obs_fq_helper.default": {},
+    "aten._fused_rms_norm.default": {},
+    "aten._histogramdd_from_bin_cts.default": {},
+    "aten._int_mm.out": {},
+    "aten._pdist_backward.default": {},
+    "aten._pdist_forward.default": {},
+    "aten._scaled_dot_product_attention_math_for_mps.default": {},
+    "aten._scaled_dot_product_cudnn_attention_backward.default": {},
+    "aten._scaled_dot_product_cudnn_attention.default": {},
+    "aten._scaled_dot_product_efficient_attention_backward.default": {},
+    "aten._scaled_dot_product_efficient_attention.default": {},
+    "aten._scaled_dot_product_flash_attention_backward.default": {},
+    "aten._scaled_dot_product_flash_attention_for_cpu_backward.default": {},
+    "aten._scaled_dot_product_flash_attention_for_cpu.default": {},
+    "aten._scaled_dot_product_flash_attention.default": {},
+    "aten._scaled_dot_product_fused_attention_overrideable_backward.default": {},
+    "aten._scaled_dot_product_fused_attention_overrideable.default": {},
+    "aten._scaled_mm.default": {},
+    "aten._scaled_grouped_mm.default": {},
+    "aten._scaled_mm.out": {},
+    "aten._segment_reduce_backward.default": {},
+    "aten._thnn_fused_lstm_cell.default": {},
+    "aten._to_sparse.default": {},
+    "aten._trilinear.default": {},
+    "aten._weight_int4pack_mm.default": {},
+    "aten._weight_int8pack_mm.default": {},
+    "aten.abs.default": {},
+    "aten.adaptive_max_pool2d_backward.default": {},
+    "aten.adaptive_max_pool2d.default": {},
+    "aten.adaptive_max_pool3d_backward.default": {},
+    "aten.adaptive_max_pool3d.default": {},
+    "aten.add.Scalar": {},
+    "aten.add.Tensor": {},
+    "aten.addbmm.default": {},
+    "aten.addmm.out": {},
+    "aten.addmv.default": {},
+    "aten.angle.default": {},
+    "aten.avg_pool2d_backward.default": {},
+    "aten.avg_pool2d.default": {},
+    "aten.avg_pool3d_backward.default": {},
+    "aten.avg_pool3d.default": {},
+    "aten.baddbmm.out": {},
+    "aten.bernoulli_.float": {},
+    "aten.bernoulli_.Tensor": {},
+    "aten.bmm.out": {},
+    "aten.bucketize.Tensor": {},
+    "aten.cat.default": {},
+    "aten.cholesky_inverse.default": {},
+    "aten.cholesky_solve.default": {},
+    "aten.convolution_backward.default": {},
+    "aten.convolution.default": {},
+    "aten.cummax.default": {},
+    "aten.cummin.default": {},
+    "aten.cumprod.default": {},
+    "aten.cumsum.default": {},
+    "aten.exponential.default": {},
+    "aten.fill_.Scalar": {},
+    "aten.fractional_max_pool2d_backward.default": {},
+    "aten.fractional_max_pool2d.default": {},
+    "aten.fractional_max_pool3d_backward.default": {},
+    "aten.fractional_max_pool3d.default": {},
+    "aten.gcd.default": {},
+    "aten.geqrf.default": {},
+    "aten.grid_sampler_2d_backward.default": {},
+    "aten.hann_window.default": {},
+    "aten.histc.default": {},
+    "aten.histogram.bin_ct": {},
+    "aten.index_put.default": {},
+    "aten.index_reduce.default": {},
+    "aten.index.Tensor": {},
+    "aten.kthvalue.default": {},
+    "aten.logcumsumexp.default": {},
+    "aten.lu_unpack.default": {},
+    "aten.masked_scatter_backward.default": {},
+    "aten.masked_scatter.default": {},
+    "aten.masked_select.default": {},
+    "aten.max_pool2d_with_indices_backward.default": {},
+    "aten.max_pool2d_with_indices.default": {},
+    "aten.max_pool3d_with_indices_backward.default": {},
+    "aten.max_pool3d_with_indices.default": {},
+    "aten.max_unpool2d.default": {},
+    "aten.max_unpool3d.default": {},
+    "aten.median.default": {},
+    "aten.mm.out": {},
+    "aten.mode.default": {},
+    "aten.mul.Scalar": {},
+    "aten.mul.Tensor": {},
+    "aten.nanmedian.default": {},
+    "aten.narrow.default": {},
+    "aten.native_dropout.default": {},
+    "aten.nonzero.default": {},
+    "aten.normal_functional.default": {},
+    "aten.ormqr.default": {},
+    "aten.pad.default": {},
+    "aten.permute.default": {},
+    "aten.polar.default": {},
+    "aten.pow.Scalar": {},
+    "aten.pow.Tensor_Scalar": {},
+    "aten.pow.Tensor_Tensor": {},
+    "aten.rand.default": {},
+    "aten.rand.generator": {},
+    "aten.randint.default": {},
+    "aten.randint.generator": {},
+    "aten.randint.low_out": {},
+    "aten.randint.low": {},
+    "aten.randn.default": {},
+    "aten.randn.generator": {},
+    "aten.randperm.default": {},
+    "aten.repeat_interleave.Tensor": {},
+    "aten.replication_pad1d_backward.default": {},
+    "aten.replication_pad2d_backward.default": {},
+    "aten.reshape.default": {},
+    "aten.resize_.default": {},
+    "aten.resize_as_.default": {},
+    "aten.scatter_reduce.two_out": {},
+    "aten.scatter.src_out": {},
+    "aten.scatter.value_out": {},
+    "aten.searchsorted.Scalar": {},
+    "aten.searchsorted.Tensor": {},
+    "aten.segment_reduce.default": {},
+    "aten.set_.source_Tensor": {},
+    "aten.slice.Tensor": {},
+    "aten.soft_margin_loss_backward.default": {},
+    "aten.sort.default": {},
+    "aten.sort.stable": {},
+    "aten.squeeze.dim": {},
+    "aten.to_sparse.default": {},
+    "aten.topk.default": {},
+    "aten.triangular_solve.default": {},
+    "aten.uniform.default": {},
+    "aten.upsample_bicubic2d_backward.default": {},
+    "aten.upsample_linear1d_backward.default": {},
+    "aten.upsample_trilinear3d_backward.default": {},
+    "aten.view_as_complex.default": {},
+    "aten.view_as_real.default": {},
+    "aten.view.dtype": {},
+    "aten._weight_int4pack_mm_with_scales_and_zeros.default": {},
+}
+
+# `python torchgen/gen.py --update-aoti-c-shim` will automatically generate
+# c_shim_aten.{h/cpp} based on the list below.
+# Operators in this list are intended to be used in torch/csrc/stable/ops.h
+# Unlike other c_shims, operators in this file do not bypass the dispatcher.
+# The same BC rules apply as inductor_fallback_ops.
+aten_shimified_ops: dict[str, dict[str, list[str]]] = {
+    "aten.fill_.Scalar": {},
+    "aten.pad.default": {},
+    "aten.narrow.default": {},
+    "aten.amax.default": {},
+    "aten.new_empty.default": {},
+    "aten.new_zeros.default": {},
+    "aten.full.default": {},
+    "aten.subtract.Tensor": {},
+}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a42adb47a6cbbb9120671e65f322a0a7caf10434
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/autograd.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/autograd.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b212a3fe596567f02cff67d56f5e76d856160cca
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/autograd.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/cpp.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/cpp.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6be05fe7e4fd059687c1a5b2192553b402b2a856
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/cpp.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/dispatcher.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/dispatcher.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..060705848f5b47bdb4b252350da36cd13ecc4560
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/dispatcher.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/functionalization.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/functionalization.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..1652ee7bd0fbdbb623dc9a095b8f8a98b108f035
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/functionalization.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/lazy.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/lazy.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e08298702641ffa08c21e716f0dddab5f9a65ce5
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/lazy.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/meta.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/meta.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..457ef97747136224923ed14c46b180e277e63ae9
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/meta.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/native.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/native.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..fb0f074d375659b65c99b4d0d8a4bce7e56b9566
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/native.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/python.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/python.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..383d0a5ebc3fec09e762b40b57ca7dd987b7388e
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/python.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/structured.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/structured.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c92d92c1d3a5361a7fa307956a2970dc4a0bb35f
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/structured.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/translate.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/translate.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..2eba1b8412617f88640f5de2a009d705aa1aa1d2
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/translate.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/ufunc.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/ufunc.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f8591fd311e0a93933c89de451323bc9c130a1b5
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/ufunc.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/unboxing.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/unboxing.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..cfe356f66c085734e1f995a6829e0b1efbcb6638
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/__pycache__/unboxing.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/autograd.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/autograd.py
new file mode 100644
index 0000000000000000000000000000000000000000..96e192d3a48a9c72202e28117409ed99bc7377f5
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/autograd.py
@@ -0,0 +1,874 @@
+from __future__ import annotations
+
+import re
+from dataclasses import dataclass
+from typing import cast, TYPE_CHECKING
+
+from torchgen import local
+from torchgen.api import cpp
+from torchgen.api.types import BaseCType, Binding, NamedCType, tensorListT
+from torchgen.model import (
+    BaseTy,
+    BaseType,
+    FunctionSchema,
+    ListType,
+    NativeFunction,
+    NativeFunctionsViewGroup,
+    SchemaKind,
+    Type,
+)
+from torchgen.utils import IDENT_REGEX
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+
+# Represents a saved attribute involved in backward calculation.
+# Note that it can be a derived property of an input argument, e.g.:
+# we could save `other.scalar_type()` instead of the entire `other` tensor.
+@dataclass(frozen=True)
+class SavedAttribute:
+    # The NamedCType holds the updated name and cpp type of the attribute
+    # for the name, Suffix is appended if it's derived property, e.g.: `other_scalar_type`
+    nctype: NamedCType
+
+    # The expression to read the derived property at save time, e.g.:
+    # `other.scalar_type()`.
+    expr: str
+
+
+# Represents a backward formula that calculates derivatives for one
+# or more tensors.
+@dataclass(frozen=True)
+class Derivative:
+    # The formula string (legit C++ expression).
+    # Note that expressions against input arguments have been replaced with the
+    # corresponding saved attributes.
+    # E.g.:
+    #  raw formula: `mul_tensor_backward(grad, self, other.scalar_type())`
+    #         here: `mul_tensor_backward(grad, self, other_scalar_type)`
+    formula: str
+
+    # The formula string before input argument replacement
+    original_formula: str
+
+    # Names of the arguments for which this formula calculates derivatives.
+    var_names: tuple[str, ...]
+
+    # Saved inputs that are referenced by the formula.
+    saved_inputs: tuple[SavedAttribute, ...]
+
+    # Saved outputs that are referenced by the formula.
+    saved_outputs: tuple[SavedAttribute, ...]
+
+    # Gradients that are referenced by name in the formula.
+    named_gradients: set[str]
+
+
+# Represents a forward formula that calculates forward derivatives
+# for one tensor.
+@dataclass(frozen=True)
+class ForwardDerivative:
+    # The formula string (legit C++ expression).
+    # Note that special keywords such as "linear" or "element_wise" have been
+    # replaced by the automatically generated formula.
+    formula: str
+
+    # Name of the output arguments for which this formula calculates forward
+    # derivatives
+    var_names: tuple[str, ...]
+
+    # Type of the output arguments for which this formula calculates forward
+    # derivatives
+    var_types: tuple[Type, ...]
+
+    # Inputs for which the forward derivatives are required for this formula
+    required_inputs_fw_grad: tuple[str, ...] | None
+
+    # Inputs for which the primal is required for this formula
+    required_inputs_primal: tuple[str, ...] | None
+
+    # Flag to specify if this formula requires the original value of self
+    # This is only used by inplace operations
+    required_original_self_value: bool
+
+    # If this formula is specified in derivatives.yaml or if we are reusing the
+    # out of place formula for inplace
+    is_reusing_outplace_formula: bool
+
+
+# Represents differentiability info for a NativeFunction.
+@dataclass(frozen=True)
+class DifferentiabilityInfo:
+    # The base name read from derivatives.yaml.
+    name: str
+
+    # The matching native function.
+    #
+    # There can be multiple NativeFunction having the same base name:
+    #  - different overloads with different types of input arguments;
+    #  - in-place/out/functional variants of the same function;
+    #
+    # We first use the schema string (under the 'name' key) in derivatives.yaml
+    # to find the NativeFunction having the same schema string.
+    # Then we find the in-place/out/functional variants of the matching function.
+    # Among these variants, we choose the one having the same name as the
+    # derivatives.yaml entry. If there is no exact match, then we choose the
+    # in-place variant.
+    # TODO: maybe the logic to search for all variants is no longer necessary?
+    func: NativeFunction
+
+    # The name of the generated autograd function.
+    # It's set only if we will calculate a derivative, i.e.
+    # 'args_with_derivatives' is not empty.
+    op: str | None
+
+    # The derivatives formulae for this function.
+    # Note that the length of this sequence is the number of differentiable inputs
+    derivatives: Sequence[Derivative]
+
+    # The forward derivatives formulae for this function.
+    # Note that the length of this sequence is the number of differentiable outputs
+    forward_derivatives: Sequence[ForwardDerivative]
+
+    # The union of 'saved_inputs' of all 'derivatives'.
+    all_saved_inputs: Sequence[SavedAttribute]
+
+    # The union of 'saved_outputs' of all 'derivatives'.
+    all_saved_outputs: Sequence[SavedAttribute]
+
+    # All named gradients that are available for use, in the same
+    # order as in the grads vector.
+    available_named_gradients: Sequence[str]
+
+    # The named gradients that are used in any of the derivatives.
+    # Invariant: all(name in available_named_gradients for name in used_named_gradients)
+    used_named_gradients: set[str]
+
+    # The function's input arguments for which it calculates derivatives.
+    # It's the union of 'var_names' of all 'derivatives', sorted by the
+    # argument order in the function schema.
+    args_with_derivatives: Sequence[Binding]
+
+    # Names of arguments whose derivative formula is 'non_differentiable'.
+    non_differentiable_arg_names: Sequence[str]
+
+    # Raw data read from derivatives.yaml.
+    output_differentiability: list[bool] | None
+
+    # output_differentiability in derivatives.yaml can be a list of
+    # conditions that express if the output is differentiable. In this case,
+    # the number of conditions must match the number of outputs
+    # (NB: we only support one condition right now).
+    # output_differentiability gets populated with True for each condition,
+    # while output_differentiability_conditions gets populated with the conditions
+    output_differentiability_conditions: list[str] | None
+
+    @property
+    def has_derivatives(self) -> bool:
+        return len(self.args_with_derivatives) > 0
+
+    # Generates a new DifferentiabilityInfo using the exact same set of derivative information,
+    # but with a new operator name.
+    # This is used when generating "copy" variants of view ops,
+    # which are able to use the exact same derivative formula as the original view op
+    # See Note [Codegen'd {view}_copy Operators]
+    def create_view_copy_from_view_derivative(
+        self, g: NativeFunctionsViewGroup
+    ) -> DifferentiabilityInfo | None:
+        if g.view_copy is None:
+            return None
+        f = g.view_copy
+
+        name_split_by_period = self.name.split(".", maxsplit=2)
+        # Append a "_copy" to the base name of the operator (but keep the overload name the same)
+        view_copy_name = f"{name_split_by_period[0]}_copy." + ".".join(
+            name_split_by_period[1:]
+        )
+        view_copy_op_name = None if self.op is None else f"{self.op}_copy"
+
+        return DifferentiabilityInfo(
+            # Use the "_copy" version of name/func/op
+            name=view_copy_name,
+            func=f,
+            op=view_copy_op_name,
+            # But keep all derivative info the same
+            derivatives=self.derivatives,
+            forward_derivatives=self.forward_derivatives,
+            all_saved_inputs=self.all_saved_inputs,
+            all_saved_outputs=self.all_saved_outputs,
+            available_named_gradients=self.available_named_gradients,
+            used_named_gradients=self.used_named_gradients,
+            args_with_derivatives=self.args_with_derivatives,
+            non_differentiable_arg_names=self.non_differentiable_arg_names,
+            output_differentiability=self.output_differentiability,
+            output_differentiability_conditions=self.output_differentiability_conditions,
+        )
+
+
+def uses_ident(info: DifferentiabilityInfo | None, ident: str) -> bool:
+    if info is None:
+        return False
+    for derivative in info.derivatives:
+        formula = derivative.formula
+        if re.search(IDENT_REGEX.format(ident), formula):
+            return True
+    return False
+
+
+def uses_retain_variables(info: DifferentiabilityInfo | None) -> bool:
+    return uses_ident(info, "retain_variables")
+
+
+def uses_single_grad(info: DifferentiabilityInfo | None) -> bool:
+    return uses_ident(info, "grad")
+
+
+# Represents a differentiable `Argument`.
+# How is it different from the `Argument` type?
+# - It's processed Arguments which are differentiable and only used in the
+#   context of the autograd codegen;
+# - It can represent SelfArgument or regular Argument but not TensorOptionsArgument;
+@dataclass(frozen=True)
+class DifferentiableInput:
+    name: str
+    type: Type
+
+    # TODO: only to keep it byte-for-byte compatible with the old codegen, should remove.
+    cpp_type: str
+
+
+# Represents a differentiable `Return`.
+# How it it different from the `Return` type?
+# - The name in `Return` is optional. Here it is always populated using the same
+#   `cpp.return_names()` method.
+#   TODO: some cpp naming logic (e.g. resolving name conflict) might be irrelevant?
+# - It's processed Returns which are differentiable, in compliance with the
+#   `output_differentiability` field defined in derivatives.yaml (if specified),
+#   and are only used in the context of the autograd codegen;
+@dataclass(frozen=True)
+class DifferentiableOutput:
+    name: str
+    type: Type
+
+    # TODO: only to keep it byte-for-byte compatible with the old codegen, should remove.
+    cpp_type: str
+
+
+@dataclass(frozen=True)
+class NativeFunctionWithDifferentiabilityInfo:
+    func: NativeFunction
+    info: dict[str, DifferentiabilityInfo] | None
+    fw_derivatives: dict[str, Sequence[ForwardDerivative]] | None
+
+
+# TODO: Update comment below since it is out of date.
+def dispatch_strategy(fn: NativeFunctionWithDifferentiabilityInfo) -> str:
+    """How are we going to call the underlying implementation of a
+    declaration?  There are two strategies:
+        - use_derived: we want to call the implementation on CPUDoubleType
+          (or a similar, derived Type instance).  Because these derived
+          instances deal in Tensors, not Variables (it's a completely different
+          object, so it doesn't dispatch back to VariableType), code on
+          this dispatch path needs to wrap/unwrap tensors.  If the
+          derived implementation takes and returns tensors, the
+          implementation is usually differentiable (although we also use
+          the derived dispatch path for non-differentiable functions
+          that we still want to dispatch on the derived Type instance;
+          e.g., size())
+        - use_type: we want to call the implementation on Type, because
+          it is implemented concretely, and the functions it invokes will
+          get dispatched back to VariableType (which will ensure that they
+          are differentiable.)
+    """
+    # fn is derived as long as any of its per-key differentiability infos
+    # has_derivatives. dispatch_strategy() is used to guard generation of fns in VariableType
+    # and ADInplaceOrViewType. We want to generate these functions as long as a
+    # derivative is defined for ANY dispatch key.
+    if fn.func.is_abstract or (
+        fn.info is not None and any(info.has_derivatives for info in fn.info.values())
+    ):
+        # If the function is abstract (not implemented on at::Type), we must
+        # call the implementation on the derived type with unpacked tensors.
+
+        # If the function has a derivative specified and is concrete, we could
+        # call either implementation. We prefer the calling the derived
+        # type's implementation with unpacked tensors because it is more
+        # performant in some cases: any internal calls to other ATen functions
+        # won't have the history tracked.
+
+        # If the function has a type dispatched argument (i.e. is a factory),
+        # we prefer calling the derived type's implementation both because it is
+        # more performant and to ensure factory functions return tensors with _version
+        # of 0 (probably not strictly necessary, but nice to have to keeps versions simple
+        # to understand.
+
+        return "use_derived"
+    else:
+        # If the function is concrete (we don't have to override it) and we
+        # didn't declare it in derivatives.yaml, we'll assume that it is
+        # actually implemented out of differentiable functions. (This
+        # assumption might not hold, but then you'll see gradcheck fail.)
+        return "use_type"
+
+
+def is_foreach_func(f: NativeFunction) -> bool:
+    return f.func.name.name.base.startswith("_foreach_")
+
+
+# note(crcrpar): Most foreach functions can reference an out-place `torch` function whose schema kind
+# is functional for their backward derivatives (and forward derivatives in the future), i.e.,
+# they would find such one in `functional_info_by_signature`. There however are some exceptions:
+_foreach_with_inplace_ref = {"_foreach_zero_"}
+_foreach_with_tensor_overload = {
+    "_foreach_add.Tensor",
+    "_foreach_mul.Tensor",
+    "_foreach_div.Tensor",
+}
+# The following do not support the alpha kwarg, which the nonforeach versions support.
+_skip_argument_len_check = {
+    "_foreach_add.Scalar",
+    "_foreach_add_.Scalar",
+    "_foreach_add.ScalarList",
+    "_foreach_add_.ScalarList",
+    "_foreach_sub.Scalar",
+    "_foreach_sub_.Scalar",
+    "_foreach_sub.ScalarList",
+    "_foreach_sub_.ScalarList",
+}
+
+
+# Checks if `function_schema` is a native, non-foreach function which `f`, a foreach function
+# reference to generate derivatives.
+def is_reference_for_foreach(
+    f: NativeFunction,
+    function_schema: FunctionSchema,
+) -> bool:
+    return (
+        f.func.name.name.base.split("_foreach_")[-1] == function_schema.name.name.base
+        and (
+            not function_schema.name.name.inplace
+            or str(f.func.name) in _foreach_with_inplace_ref
+        )
+        and (
+            str(f.func.name) in _skip_argument_len_check
+            or len(f.func.arguments.flat_non_out)
+            == len(function_schema.arguments.flat_non_out)
+        )
+        and all(
+            ref_arg.type in (arg.type, getattr(arg.type, "elem", None))
+            for arg, ref_arg in zip(
+                f.func.arguments.flat_non_out,
+                function_schema.arguments.flat_non_out,
+            )
+        )
+    )
+
+
+# TODO(crcrpar): Avoid hard coding "Default" ideally.
+def gen_foreach_derivativeinfo(
+    foreach_function: NativeFunction,
+    functional_info_by_signature: dict[
+        FunctionSchema, dict[str, DifferentiabilityInfo]
+    ],
+    non_functional_info_by_signature: dict[
+        FunctionSchema, dict[str, DifferentiabilityInfo]
+    ],
+    dispatch_key: str = "Default",
+) -> tuple[DifferentiabilityInfo | None, bool]:
+    """Generate DifferentiabilityInfo for out-place foreach function, return the existing one for in-place.
+
+    The second return value indicates whether the info is generated in this function.
+    """
+    ref_diff_info: DifferentiabilityInfo | None = None
+
+    for function_schema, diff_info in functional_info_by_signature.items():
+        if not is_reference_for_foreach(foreach_function, function_schema):
+            continue
+        ref_diff_info = diff_info[dispatch_key]
+        if ref_diff_info is not None:
+            break
+    # note(crcrpar): It seems like `zero`'s info isn't available in functional_info_by_signature
+    # while the info of `zero_` is in non_functional_info_by_signature
+    if (
+        ref_diff_info is None
+        and foreach_function.func.kind() == SchemaKind.inplace
+        and str(foreach_function.func.name) in _foreach_with_inplace_ref
+    ):
+        for function_schema, diff_info in non_functional_info_by_signature.items():
+            if not is_reference_for_foreach(foreach_function, function_schema):
+                continue
+            ref_diff_info = diff_info[dispatch_key]
+            if ref_diff_info is not None:
+                break
+    if ref_diff_info is None:
+        return None, False
+
+    # non out-place uses the existing Derivative.
+    if foreach_function.func.kind() == SchemaKind.inplace:
+        return ref_diff_info, False
+
+    map_refarg2foreacharg, map_name2arg = {}, {}
+    for i, (arg, ref_arg) in enumerate(
+        zip(
+            foreach_function.func.arguments.flat_non_out,
+            function_schema.arguments.flat_non_out,
+        )
+    ):
+        map_refarg2foreacharg[ref_arg.name] = arg.name
+        map_name2arg[arg.name] = arg
+
+    all_saved_inputs, all_saved_outputs, all_var_names = [], [], []
+    modified_derivative_formulas = []
+    for i, derivative in enumerate(ref_diff_info.derivatives):
+        modified_formula = derivative.formula.replace("grad", "grads[i]").replace(
+            "result", "result[i]"
+        )
+        saved_inputs, saved_outputs = [], []
+        # note(crcrpar): This context seems necessary to call `cpp.argument_type`
+        with local.parametrize(
+            use_const_ref_for_mutable_tensors=foreach_function.use_const_ref_for_mutable_tensors,
+            use_ilistref_for_tensor_lists=foreach_function.part_of_structured_group,
+        ):
+            for ref_input in derivative.saved_inputs:
+                ref_input_jit_name = ref_input.expr.split(".")[0]
+                mapped_name = map_refarg2foreacharg[ref_input_jit_name]
+                if isinstance(map_name2arg[mapped_name].type, ListType):
+                    mapped_expr = mapped_name + "[i]"
+                else:
+                    mapped_expr = mapped_name
+                new_expr = ref_input.expr.replace(ref_input_jit_name, mapped_expr)
+                modified_formula = modified_formula.replace(
+                    cast(str, ref_input.nctype.name), new_expr
+                )
+
+                nctype = cpp.argument_type(map_name2arg[mapped_name], binds=mapped_name)
+                canonical_nctype = NamedCType(
+                    nctype.name, nctype.type.remove_const_ref()
+                )
+                saved_inputs.append(
+                    SavedAttribute(nctype=canonical_nctype, expr=mapped_name)
+                )
+            for ref_output in derivative.saved_outputs:
+                if ref_output.nctype.name == "result":
+                    saved_outputs.append(
+                        SavedAttribute(
+                            nctype=NamedCType(
+                                name="result", type=BaseCType(tensorListT)
+                            ),
+                            expr="result",
+                        )
+                    )
+                else:
+                    raise RuntimeError("")
+        var_names = [map_refarg2foreacharg[var] for var in derivative.var_names]
+        all_var_names.extend(var_names)
+        all_saved_inputs.extend(saved_inputs)
+        all_saved_outputs.extend(saved_outputs)
+        modified_derivative = Derivative(
+            formula=modified_formula,
+            original_formula=derivative.formula,
+            var_names=tuple(var_names),
+            saved_inputs=tuple(saved_inputs),
+            saved_outputs=tuple(saved_outputs),
+            named_gradients=set(),
+        )
+        modified_derivative_formulas.append(modified_derivative)
+
+    with local.parametrize(
+        use_const_ref_for_mutable_tensors=foreach_function.use_const_ref_for_mutable_tensors,
+        use_ilistref_for_tensor_lists=foreach_function.part_of_structured_group,
+    ):
+        args_with_derivatives = [
+            Binding(
+                name=arg.name,
+                nctype=cpp.argument_type(arg, binds=arg.name),
+                argument=arg,
+                default=None,
+            )
+            for arg in foreach_function.func.arguments.flat_non_out
+            if arg.name in all_var_names
+        ]
+
+    forward_derivatives: list[ForwardDerivative] = []
+    fw_derivative: ForwardDerivative
+    for fw_derivative in ref_diff_info.forward_derivatives:
+        var_names: list[str] = list(fw_derivative.var_names)  # type: ignore[no-redef]
+        var_types: list[Type] = list(fw_derivative.var_types)
+        required_inputs_fw_grad: list[str] = []
+        required_inputs_primal: list[str] = []
+        if fw_derivative.required_inputs_fw_grad is not None:
+            required_inputs_fw_grad = list(fw_derivative.required_inputs_fw_grad)
+        if fw_derivative.required_inputs_primal:
+            required_inputs_primal = list(fw_derivative.required_inputs_primal)
+        modified_formula = fw_derivative.formula
+
+        # Foreach's result is TensorList
+        if "result" in modified_formula:
+            modified_formula = fw_derivative.formula.replace("result", "result[i]")
+
+        for foreach_arg, ref_arg in zip(
+            foreach_function.func.arguments.flat_non_out,
+            ref_diff_info.func.func.arguments.flat_non_out,
+        ):
+            # Modify reference forward formula
+            if (
+                isinstance(foreach_arg.type, ListType)
+                and not foreach_arg.type.is_tensor_like()
+            ):
+                # Assuming ScalarList
+                modified_formula = modified_formula.replace(
+                    ref_arg.name, foreach_arg.name + "[i]"
+                )
+            elif foreach_arg.type.is_tensor_like():
+                # Assuming TensorList / Tensor
+                # assert isinstance(foreach_arg.type, ListType), f"{foreach_function.func.name}, {foreach_arg.type}"
+                assert isinstance(foreach_arg.type, ListType) or (
+                    foreach_arg.type == BaseType(BaseTy.Tensor)
+                    and str(foreach_function.func.name) in _foreach_with_tensor_overload
+                ), f"{foreach_function.func.name}, {foreach_arg.type}"
+                for suffix in ("_p", "_t"):
+                    curr_expr = ref_arg.name + suffix
+                    if curr_expr in modified_formula:
+                        new_expr = foreach_arg.name + suffix
+                        modified_formula = modified_formula.replace(curr_expr, new_expr)
+            else:
+                # Assuming Scalar
+                if foreach_arg.name != ref_arg.name:
+                    modified_formula = modified_formula.replace(
+                        ref_arg.name, foreach_arg.name
+                    )
+
+            # note(crcrpar): there should exist a cooler way...
+            for i, name in enumerate(var_names):
+                if name == ref_arg.name:
+                    var_names[i] = foreach_arg.name
+                    var_types[i] = foreach_arg.type
+            for i, name in enumerate(required_inputs_fw_grad):
+                if name == ref_arg.name:
+                    required_inputs_fw_grad[i] = foreach_arg.name
+            for i, name in enumerate(required_inputs_primal):
+                if name == ref_arg.name:
+                    required_inputs_primal[i] = foreach_arg.name
+        forward_derivatives.append(
+            ForwardDerivative(
+                formula=modified_formula,
+                var_names=tuple(var_names),
+                var_types=tuple(var_types),
+                required_inputs_fw_grad=tuple(required_inputs_fw_grad),
+                required_inputs_primal=tuple(required_inputs_primal),
+                required_original_self_value=fw_derivative.required_original_self_value,
+                is_reusing_outplace_formula=fw_derivative.is_reusing_outplace_formula,
+            )
+        )
+
+    return (
+        DifferentiabilityInfo(
+            name=foreach_function.func.name.name.base,
+            func=foreach_function,
+            op=f"Foreach{ref_diff_info.op}{foreach_function.func.name.overload_name}",
+            derivatives=modified_derivative_formulas,
+            forward_derivatives=forward_derivatives,
+            all_saved_inputs=tuple(set(all_saved_inputs)),
+            all_saved_outputs=tuple(set(all_saved_outputs)),
+            available_named_gradients=(),
+            used_named_gradients=set(),
+            args_with_derivatives=args_with_derivatives,
+            non_differentiable_arg_names=[],
+            output_differentiability=None,
+            output_differentiability_conditions=None,
+        ),
+        True,
+    )
+
+
+def match_differentiability_info(
+    native_functions: list[NativeFunction],
+    differentiability_infos: dict[FunctionSchema, dict[str, DifferentiabilityInfo]],
+) -> list[NativeFunctionWithDifferentiabilityInfo]:
+    """Sets the "derivative" key on declarations to matching autograd function
+    In-place functions will use the out-of-place derivative definition if there
+    is no in-place specific derivative.
+    """
+
+    functional_info_by_signature = {
+        schema.signature(strip_default=True): info_dict
+        for schema, info_dict in differentiability_infos.items()
+        if schema.kind() == SchemaKind.functional
+    }
+    non_functional_info_by_signature = {
+        schema.signature(strip_default=True): info_dict
+        for schema, info_dict in differentiability_infos.items()
+        if schema.kind() != SchemaKind.functional
+    }
+
+    def find_info(
+        f: NativeFunction,
+    ) -> tuple[dict[str, DifferentiabilityInfo] | None, bool]:
+        # Don't bother matching info to generated out= variants
+        if "generated" in f.tags and f.func.kind() == SchemaKind.out:
+            return None, False
+
+        # (1) Check for an exact match
+        if f.func in differentiability_infos:
+            return differentiability_infos[f.func], True
+
+        # (2) If no exact match, check if the out-of-place variant
+        # of this operator has a match.
+        # i.e mul() for mul_() or mul_out()
+        # note(crcrpar): Check foreach or not because in-place foreach functions use backward defined for the existing
+        # native functions instead of the out-place counterparts.
+        f_sig = f.func.signature(strip_default=True)
+        if f_sig in functional_info_by_signature and not is_foreach_func(f):
+            return functional_info_by_signature[f_sig], False
+
+        # (3) Some operators have a derivative explicitly defined for the mutable
+        # variant, but get a code-generated out-of-place variant which does *not*
+        # come with a derivative formula.
+        # For the generated out-of-place variant, use the mutable variant's formula
+        # if it exists.
+        if "generated" in f.tags and f_sig in non_functional_info_by_signature:
+            info_dict = non_functional_info_by_signature[f_sig]
+            # See https://github.com/pytorch/pytorch/pull/76320/files#r874816389
+            assert not any(
+                any("self" in str(input.nctype.name) for input in info.all_saved_inputs)
+                for info in info_dict.values()
+            ), f"""\
+Attempted to convert a derivative formula for a mutable operator
+ to be used by automatically by its functional variant ("{str(f.func)}").
+ this is not currently supported (we'd need to fix up the formula in the codegen)."""
+            return info_dict, False
+
+        # (4) Generate derivative information of foreach functions if none is defined in `derivatives.yaml`
+        if is_foreach_func(f):
+            assert f.func not in differentiability_infos
+            diff_info, is_generated = gen_foreach_derivativeinfo(
+                f,
+                functional_info_by_signature,
+                non_functional_info_by_signature,
+            )
+            if diff_info is None:
+                return None, False
+            # TODO(crcrpar): Avoid hard coding "Default" ideally.
+            diff_info_dict = {"Default": diff_info}
+            if is_generated:
+                differentiability_infos[f.func] = diff_info_dict
+                functional_info_by_signature[f.func] = diff_info_dict
+            return diff_info_dict, is_generated
+
+        return None, False
+
+    result: list[NativeFunctionWithDifferentiabilityInfo] = []
+    for f in native_functions:
+        info_dict, is_exact_match = find_info(f)
+
+        # Currently, the '.strides()' to 'strides_or_error' replacement does not support
+        # 'self' derivatives of an inplace function, so we must check for this case.
+        if f.func.kind() == SchemaKind.inplace and (info_dict is not None):
+            for info in info_dict.values():
+                for derivative in info.derivatives:
+                    if "self" in derivative.var_names:
+                        for saved_input in derivative.saved_inputs:
+                            assert "strides_or_error" not in saved_input.expr, (
+                                "Calling '.strides()' in the 'self' derivative formula of an "
+                                f"in-place function is not supported: {f.func}"
+                            )
+
+        if not info_dict:
+            result.append(
+                NativeFunctionWithDifferentiabilityInfo(
+                    func=f, info=None, fw_derivatives=None
+                )
+            )
+            continue
+
+        fw_derivative_dict: dict[str, Sequence[ForwardDerivative]] = {}
+        for key, info in info_dict.items():
+            if not info.forward_derivatives:
+                fw_derivative_dict[key] = []
+                continue
+
+            forward_derivatives = info.forward_derivatives
+
+            # For functions that have a single def for out-of-place and inplace (like abs())
+            if f.func.kind() == SchemaKind.inplace:
+                # For inplace functions there is a little bit of work to do:
+                #  1) Validate the formula and make sure the input that is modified in not used:
+                #    - If there is a formula for the inplace variant of the function (is_exact_match == True) then
+                #      we make sure that the original value of the input that is being modified inplace (self_p) is
+                #      not used in the formula. Note that the formula can use "original_self_p" here and that would
+                #      trigger a clone of the original input.
+                #    - If we are reusing the out of place formula (is_exact_match == False) then we replace every
+                #      occurrence of self_p and self_t by original_self_p and original_self_t. These will be
+                #      populated by cloned version of the original input (either the clone done by the backward AD
+                #      logic if self is also used in a backward formula or a special clone that we add).
+                #  2) At this point, there cannot be a self_p in the formula.
+                #  3) Change "result" into "self_p" as by design, in the inplace function codegen, the result is
+                #     simply called self (as it is modified inplace).
+                #  4) Update the required primals data in case it used to contain "result" but should now contain
+                #     "self"
+                #  5) If it is not an exact match, the user formula is not modifying the existing forward grad
+                #     inplace as it should. So add some code that makes sure that we do so if the forward grad
+                #     already exists.
+
+                assert (
+                    len(info.forward_derivatives) == 1
+                )  # Only single output inplace should exist
+                fw_info = info.forward_derivatives[0]
+                formula = fw_info.formula
+
+                def replace_self_with_original_self(formula: str, postfix: str) -> str:
+                    def repl(m: re.Match[str]) -> str:
+                        return f"{m.group(1)}original_self{postfix}{m.group(2)}"
+
+                    return re.sub(IDENT_REGEX.format(f"self{postfix}"), repl, formula)
+
+                if re.search(IDENT_REGEX.format("self_p"), formula):
+                    if is_exact_match:
+                        # For manually defined formulas, don't allow the original value to be used
+                        raise RuntimeError(
+                            f'The formula for "{f.func.name}" is using the original value of self '
+                            "that is being modified inplace. This would lead to wrong forward gradients. "
+                            'Please use "result" in the formula only.'
+                        )
+                    else:
+                        # When the original formula is out of place, we save a clone of the primal
+                        # value to be able to access this value if needed
+                        # replace "self_p"/"self_t" from the formula by "original_self_p"/"original_self_t"
+                        formula = replace_self_with_original_self(formula, "_p")
+                        formula = replace_self_with_original_self(formula, "_t")
+
+                # replace "result" from the formula by "self_p"
+                def repl(m: re.Match[str]) -> str:
+                    return f"{m.group(1)}self_p{m.group(2)}"
+
+                formula = re.sub(IDENT_REGEX.format("result"), repl, formula)
+
+                required_primals = fw_info.required_inputs_primal
+                if re.search(IDENT_REGEX.format("self_p"), formula):
+                    required_primals = (
+                        required_primals + ("self",) if required_primals else ("self",)
+                    )
+
+                if not is_exact_match:
+                    # NOTE [In-place forward AD formula Optimization]
+                    #
+                    # This optimization transforms the formula to directly do inplace, i.e.
+                    # instead of self_t.copy_(self_t.op()) we do self_t.op_() when the following are met:
+                    #
+                    # 1) the formula satisfies the pattern: "self_t.op(*args)"
+                    # 2) "op" in (1) needs to be the same as the op the derivative is for
+                    #
+                    # (2) may seem too strict, but currently the only ops that satisfy (1) also satisfy (2)
+                    # If there is a need, we can relax (2) to allow any op that has an in-place variant
+                    is_single_method_on_self_t = False
+                    directly_do_inplace = False
+                    op_name: str | None = None
+                    between_parens: str | None = None
+                    match = re.fullmatch(r"self_t.([\w]*)\((.*)\)", formula)
+                    if match:
+                        op_name, between_parens = match.group(1), match.group(2)
+
+                        # We want to...
+                        #   Match: self_t.op1(other_p.op2(arg))
+                        #   Avoid: self_t.op1(args) + self_t.op2(args)
+                        #   Avoid: self_t.op1(other_p.op2(arg)) + self_t.op2(args)
+                        def check_parens_nest_level_gt_zero(s: str) -> bool:
+                            level = 1
+                            for ch in s:
+                                if ch == ")":
+                                    level -= 1
+                                    if level == 0:
+                                        return False
+                                if ch == "(":
+                                    level += 1
+                            return True
+
+                        is_single_method_on_self_t = check_parens_nest_level_gt_zero(
+                            between_parens
+                        )
+                        directly_do_inplace = (
+                            is_single_method_on_self_t and op_name == info.name
+                        )
+
+                    if directly_do_inplace:
+                        assert op_name is not None
+                        assert between_parens is not None
+                        formula = f"self_t_raw.defined() ? self_t_raw.{op_name}_({between_parens}) : {formula}"
+                    else:
+                        # Make sure that the forward grad is modified inplace when the original formula
+                        # is out of place
+                        formula = f"self_t_raw.defined() ? self_t_raw.copy_({formula}) : {formula}"
+
+                required_original_self_value = bool(
+                    re.search(IDENT_REGEX.format("original_self_p"), formula)
+                ) or bool(re.search(IDENT_REGEX.format("original_self_t"), formula))
+
+                forward_derivatives = [
+                    ForwardDerivative(
+                        formula=formula,
+                        var_names=("self",),
+                        var_types=fw_info.var_types,
+                        required_inputs_fw_grad=fw_info.required_inputs_fw_grad,
+                        required_inputs_primal=required_primals,
+                        required_original_self_value=required_original_self_value,
+                        is_reusing_outplace_formula=not is_exact_match,
+                    ),
+                ]
+
+            fw_derivative_dict[key] = forward_derivatives
+
+        result.append(
+            NativeFunctionWithDifferentiabilityInfo(
+                func=f, info=info_dict, fw_derivatives=fw_derivative_dict
+            )
+        )
+
+    return result
+
+
+def is_differentiable(
+    name: str, type: Type, info: DifferentiabilityInfo | None
+) -> bool:
+    return type.is_tensor_like() and (
+        info is None or name not in info.non_differentiable_arg_names
+    )
+
+
+def gen_differentiable_outputs(
+    fn: NativeFunctionWithDifferentiabilityInfo, key: str = "Default"
+) -> list[DifferentiableOutput]:
+    f = fn.func
+    info = fn.info[key] if fn.info else None
+    outputs: list[DifferentiableOutput] = [
+        DifferentiableOutput(
+            name=name,
+            type=ret.type,
+            cpp_type=cpp.return_type(ret, symint=True).cpp_type(),
+        )
+        for name, ret in zip(cpp.return_names(f), f.func.returns)
+    ]
+    output_differentiability = info.output_differentiability if info else None
+    if output_differentiability is not None:
+        if len(output_differentiability) != len(outputs):
+            raise RuntimeError(
+                f"The length of output_differentiability ({len(output_differentiability)}), "
+                f"does not match the number of outputs ({len(outputs)})."
+            )
+        differentiable_outputs: list[DifferentiableOutput] = []
+        if False in output_differentiability and f.func.kind() == SchemaKind.inplace:
+            raise RuntimeError(
+                "output_differentiability=False for inplace operation (version_counter won't get updated)"
+            )
+        for differentiable, output in zip(output_differentiability, outputs):
+            if differentiable:
+                differentiable_outputs.append(output)
+        return differentiable_outputs
+    candidate_differentiable_outputs = list(
+        filter(lambda r: is_differentiable(r.name, r.type, info), outputs)
+    )
+    if uses_single_grad(info):
+        return candidate_differentiable_outputs[:1]
+    else:
+        return candidate_differentiable_outputs
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/cpp.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/cpp.py
new file mode 100644
index 0000000000000000000000000000000000000000..862cef30dba49f4341a3c980845fdb7a2c1cbcd5
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/cpp.py
@@ -0,0 +1,469 @@
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+from typing_extensions import assert_never
+
+from torchgen import local
+from torchgen.api.types import (
+    ArgName,
+    ArrayCType,
+    ArrayRefCType,
+    BaseCType,
+    BaseTypeToCppMapping,
+    Binding,
+    boolT,
+    ConstRefCType,
+    CType,
+    dimnameListT,
+    intArrayRefT,
+    iTensorListRefT,
+    ListCType,
+    longT,
+    MutRefCType,
+    NamedCType,
+    OptionalCType,
+    optionalIntArrayRefT,
+    optionalSymIntArrayRefT,
+    scalarT,
+    SpecialArgName,
+    symIntArrayRefT,
+    SymIntT,
+    tensorListT,
+    tensorOptionsT,
+    tensorT,
+    TupleCType,
+    VectorCType,
+    voidT,
+)
+from torchgen.model import (
+    Argument,
+    Arguments,
+    BaseTy,
+    BaseType,
+    FunctionSchema,
+    ListType,
+    NativeFunction,
+    OptionalType,
+    Return,
+    SelfArgument,
+    TensorOptionsArguments,
+    Type,
+)
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+
+# This file describes the translation of JIT schema to the public C++
+# API, which is what people use when they call functions like at::add.
+#
+# Prominent characteristics of the C++ API:
+#
+#   - dtype, layout, device and pin_memory are collected into
+#     a single C++ type TensorOptions  (the native functions API
+#     also has this, but tensor options is really most relevant
+#     for the C++ API; it makes calling kwarg factory functions
+#     pleasant)
+#
+#   - defaulting lives here (in fact, the dispatcher is completely
+#     oblivious of defaults!)
+#
+# BTW: policy on name collisions: we try not to have types with
+# collisions, but functions are fair game to collide
+
+
+def name(
+    func: FunctionSchema,
+    *,
+    faithful_name_for_out_overloads: bool = False,
+    symint_overload: bool = False,
+) -> str:
+    name = str(func.name.name)
+    if symint_overload:
+        name += "_symint"
+    if func.is_out_fn():
+        if faithful_name_for_out_overloads:
+            name += "_outf"
+        else:
+            name += "_out"
+
+    return name
+
+
+# Translation of "value types" in JIT schema to C++ API type.  Value
+# types look the same no matter if they are argument types or return
+# types.  Returns None if the type in question is not a value type.
+def valuetype_type(
+    t: Type,
+    *,
+    binds: ArgName,
+    mutable: bool = True,
+    symint: bool = False,
+) -> NamedCType | None:
+    if isinstance(t, BaseType):
+        if t.name in (BaseTy.Tensor, BaseTy.Scalar):
+            return None
+        elif str(t) == "SymInt":
+            if symint:
+                return NamedCType(binds, BaseCType(SymIntT))
+            else:
+                return NamedCType(binds, BaseCType(longT))
+        # All other BaseType currently map directly to BaseCppTypes.
+        return NamedCType(binds, BaseCType(BaseTypeToCppMapping[t.name]))
+    elif isinstance(t, OptionalType):
+        elem = valuetype_type(t.elem, binds=binds, mutable=mutable, symint=symint)
+        if elem is None:
+            return None
+        return NamedCType(binds, OptionalCType(elem.type))
+    elif isinstance(t, ListType):
+        if str(t.elem) == "bool":
+            assert t.size is not None
+            return NamedCType(binds, ArrayCType(BaseCType(boolT), t.size))
+        else:
+            return None
+    else:
+        raise AssertionError(f"unrecognized type {repr(t)}")
+
+
+# Translation of types occurring in JIT arguments to a C++ argument type.
+# If remove_non_owning_ref_types is set, we'll guarantee that the output CType is not a non-owning reference type.
+# For example, we'll return std::vector<int> instead of IntArrayRef.
+# See Note [translation from C++ reference to value types]
+def argumenttype_type(
+    t: Type,
+    *,
+    mutable: bool,
+    binds: ArgName,
+    remove_non_owning_ref_types: bool = False,
+    symint: bool = False,
+) -> NamedCType:
+    # If it's a value type, do the value type translation
+    r = valuetype_type(
+        t,
+        binds=binds,
+        mutable=mutable,
+        symint=symint,
+    )
+    if r is not None:
+        return r
+
+    if isinstance(t, BaseType):
+        if t.name == BaseTy.Tensor:
+            if mutable and not local.use_const_ref_for_mutable_tensors():
+                return NamedCType(binds, MutRefCType(BaseCType(tensorT)))
+            else:
+                return NamedCType(binds, ConstRefCType(BaseCType(tensorT)))
+        elif t.name == BaseTy.Scalar:
+            return NamedCType(binds, ConstRefCType(BaseCType(scalarT)))
+        else:
+            raise AssertionError(f"base type should have been value type {t}")
+    elif isinstance(t, OptionalType):
+        if str(t.elem) == "Tensor":
+            if mutable and not local.use_const_ref_for_mutable_tensors():
+                return NamedCType(
+                    binds, MutRefCType(BaseCType(tensorT))
+                )  # TODO: fix this discrepancy
+            else:
+                return NamedCType(
+                    binds, ConstRefCType(OptionalCType(BaseCType(tensorT)))
+                )
+        elif str(t.elem) == "Scalar":
+            return NamedCType(binds, ConstRefCType(OptionalCType(BaseCType(scalarT))))
+        elif isinstance(t.elem, ListType) and str(t.elem.elem) == "int":
+            return NamedCType(binds, BaseCType(optionalIntArrayRefT))
+        elif isinstance(t.elem, ListType) and str(t.elem.elem) == "SymInt":
+            if symint:
+                return NamedCType(binds, BaseCType(optionalSymIntArrayRefT))
+            else:
+                return NamedCType(binds, BaseCType(optionalIntArrayRefT))
+        elem = argumenttype_type(t.elem, mutable=mutable, binds=binds, symint=symint)
+        return NamedCType(binds, OptionalCType(elem.type))
+    elif isinstance(t, ListType):
+        # TODO: remove these special cases, ArrayRef fallthrough works fine
+        if str(t.elem) == "int":
+            if remove_non_owning_ref_types:
+                return NamedCType(binds, VectorCType(BaseCType(longT)))
+            else:
+                return NamedCType(binds, BaseCType(intArrayRefT))
+        if str(t.elem) == "SymInt":
+            if remove_non_owning_ref_types:
+                if symint:
+                    return NamedCType(binds, VectorCType(BaseCType(SymIntT)))
+                else:
+                    return NamedCType(binds, VectorCType(BaseCType(longT)))
+            else:
+                if symint:
+                    return NamedCType(binds, BaseCType(symIntArrayRefT))
+                else:
+                    return NamedCType(binds, BaseCType(intArrayRefT))
+        if str(t.elem) == "Tensor":
+            if local.use_ilistref_for_tensor_lists():
+                return NamedCType(binds, ConstRefCType(BaseCType(iTensorListRefT)))
+            else:
+                return NamedCType(binds, BaseCType(tensorListT))
+        elif str(t.elem) == "Scalar":
+            return NamedCType(binds, ArrayRefCType(BaseCType(scalarT)))
+        elif str(t.elem) == "Dimname":
+            return NamedCType(binds, BaseCType(dimnameListT))
+        elif str(t.elem) == "Tensor?":
+            return NamedCType(
+                binds, ConstRefCType(ListCType(OptionalCType(BaseCType(tensorT))))
+            )
+        elem = argumenttype_type(t.elem, mutable=mutable, binds=binds, symint=symint)
+        return NamedCType(binds, ArrayRefCType(elem.type))
+    else:
+        raise AssertionError(f"unrecognized type {repr(t)}")
+
+
+# Translate a JIT argument into its C++ type
+def argument_type(a: Argument, *, binds: ArgName, symint: bool = False) -> NamedCType:
+    return argumenttype_type(a.type, mutable=a.is_write, symint=symint, binds=binds)
+
+
+# Translation of a (non-multi) return type from JIT to C++
+# N.B: returntype_type returns a CType, not a NamedCType.
+# This is mostly because of the mismatch between return types and return names.
+# e.g. a function with a return type of 'void' has 0 return names,
+# and a function with a return type of 'std::tuple' has >1 return name.
+def returntype_type(t: Type, *, mutable: bool, symint: bool = False) -> CType:
+    # placeholder is ignored
+    # NB: symint is ALWAYS respected for return types.  So symint argument
+    # here is IGNORED
+    r = valuetype_type(t, binds="__placeholder__", mutable=mutable, symint=True)
+    if r is not None:
+        return r.type
+
+    if isinstance(t, BaseType):
+        if t.name == BaseTy.Tensor:
+            if mutable:
+                if local.use_const_ref_for_mutable_tensors():
+                    return ConstRefCType(BaseCType(tensorT))
+                else:
+                    return MutRefCType(BaseCType(tensorT))
+            else:
+                # Note [Tensor Copy Returns]
+                # Currently, we use "Argument.is_write" to determine
+                # whether or not Tensor return types should be copies or references.
+                # If that ever changes, take a look at other locations of this note!
+                return BaseCType(tensorT)
+        elif t.name == BaseTy.Scalar:
+            return BaseCType(scalarT)
+    elif isinstance(t, ListType):
+        assert not mutable, (
+            "Native functions should never return a mutable tensor list. They should return void."
+        )
+        elem = returntype_type(t.elem, mutable=False)
+        assert t.size is None, f"fixed size list returns not supported: {t}"
+        return VectorCType(elem)
+    elif isinstance(t, OptionalType):
+        elem = returntype_type(t.elem, mutable=mutable)
+        if str(t.elem) == "Tensor":
+            return OptionalCType(elem)
+
+    raise AssertionError(f"unrecognized return type {t}")
+
+
+# Translation of a single return to its C++ type
+def return_type(r: Return, *, symint: bool = False) -> CType:
+    return returntype_type(r.type, mutable=r.is_write, symint=symint)
+
+
+# Translation of a full (possibly multi) return from JIT to its C++ type
+def returns_type(rs: Sequence[Return], *, symint: bool = False) -> CType:
+    if len(rs) == 0:
+        return BaseCType(voidT)
+    elif len(rs) == 1:
+        return return_type(rs[0], symint=symint)
+    else:
+        return TupleCType([return_type(r, symint=symint) for r in rs])
+
+
+def return_names(f: NativeFunction, *, fallback_name: str = "result") -> Sequence[str]:
+    returns: list[str] = []
+    for i, r in enumerate(f.func.returns):
+        # If we have an inplace function, the return argument is
+        # implicitly named self.
+        # TODO: Consider incorporating this into the data model
+        if f.func.name.name.inplace:
+            assert i == 0, "illegal inplace function with multiple returns"
+            name = "self"
+        # If we are out function, the name is the name of the
+        # corresponding output function (r.name will get recorded
+        # in field_name later.)
+        elif f.func.is_out_fn():
+            name = f.func.arguments.out[i].name
+        # If the return argument is explicitly named...
+        elif r.name:
+            name_conflict = any(
+                r.name == a.name for a in f.func.schema_order_arguments()
+            )
+            if name_conflict and not f.func.is_out_fn():
+                name = f"{r.name}_return"
+            else:
+                name = r.name
+        # If there is no explicit name and no fallback name was passed in, we just name the output result,
+        # unless it's a multi-return, in which case it's result0,
+        # result1, etc (zero-indexed)
+        else:
+            name = fallback_name if len(f.func.returns) == 1 else f"{fallback_name}{i}"
+        returns.append(name)
+    return returns
+
+
+JIT_TO_CPP_DEFAULT = {
+    "False": "false",
+    "True": "true",
+    "None": "::std::nullopt",  # UGH this one is type directed
+    "Mean": "at::Reduction::Mean",
+    "[]": "{}",
+    "contiguous_format": "c10::MemoryFormat::Contiguous",
+    "long": "at::kLong",
+}
+
+
+# Convert a JIT default into C++ expression representing the default
+def default_expr(d: str, t: Type, *, symint: bool) -> str:
+    if d == "None" and str(t) == "Tensor?":
+        return "{}"
+    if isinstance(t, BaseType) and t.name is BaseTy.str:
+        # Schema allows single quotes but C++ needs double
+        if len(d) >= 2 and d[0] == "'" and d[-1] == "'":
+            s = ""
+            i = 1
+            while i + 1 < len(d):
+                if d[i] != "\\":
+                    if d[i] == '"':
+                        s += '\\"'
+                    else:
+                        s += d[i]
+                    i += 1
+                else:
+                    if d[i + 1] == "'":
+                        s += "'"
+                    else:
+                        s += d[i : i + 2]
+                    i += 2
+
+            return f'"{s}"'
+
+    if isinstance(t, OptionalType):
+        if d == "None":
+            return "::std::nullopt"
+
+        return default_expr(d, t.elem, symint=symint)
+
+    if isinstance(t, ListType):
+        if d.startswith("[") and d.endswith("]"):
+            return "{" + d[1:-1] + "}"
+        elif symint and d.isdigit() and str(t.elem) == "SymInt":
+            return f"c10::SymInt({d})"
+        elif t.size is None:
+            # NOTE: Sized lists can have scalar defaults
+            raise ValueError(f"Expected a list default '[...]' but found: '{d}'")
+
+    return JIT_TO_CPP_DEFAULT.get(d, d)
+
+
+# Convert an argument into its C++ API form
+
+
+def argument(
+    a: Argument | TensorOptionsArguments | SelfArgument,
+    *,
+    cpp_no_default_args: set[str],
+    method: bool,
+    faithful: bool,
+    symint: bool = False,
+    has_tensor_options: bool,
+) -> list[Binding]:
+    def sub_argument(
+        a: Argument | TensorOptionsArguments | SelfArgument,
+    ) -> list[Binding]:
+        return argument(
+            a,
+            cpp_no_default_args=cpp_no_default_args,
+            method=method,
+            faithful=faithful,
+            symint=symint,
+            has_tensor_options=has_tensor_options,
+        )
+
+    if isinstance(a, Argument):
+        binds: ArgName
+        if a.name == "memory_format" and has_tensor_options:
+            binds = SpecialArgName.possibly_redundant_memory_format
+        else:
+            binds = a.name
+        default: str | None = None
+        if a.name not in cpp_no_default_args and a.default is not None:
+            default = default_expr(a.default, a.type, symint=symint)
+        return [
+            Binding(
+                nctype=argument_type(a, binds=binds, symint=symint),
+                name=a.name,
+                default=default,
+                argument=a,
+            )
+        ]
+    elif isinstance(a, TensorOptionsArguments):
+        if faithful:
+            return (
+                sub_argument(a.dtype)
+                + sub_argument(a.layout)
+                + sub_argument(a.device)
+                + sub_argument(a.pin_memory)
+            )
+        else:
+            default = None
+            # Enforced by NativeFunction.__post_init__
+            assert "options" not in cpp_no_default_args
+            if all(x.default == "None" for x in a.all()):
+                default = "{}"
+            elif a.dtype.default == "long":
+                default = "at::kLong"  # TODO: this is wrong
+            return [
+                Binding(
+                    nctype=NamedCType("options", BaseCType(tensorOptionsT)),
+                    name="options",
+                    default=default,
+                    argument=a,
+                )
+            ]
+    elif isinstance(a, SelfArgument):
+        if method:
+            # Caller is responsible for installing implicit this in context!
+            return []
+        else:
+            return sub_argument(a.argument)
+    else:
+        assert_never(a)
+
+
+def arguments(
+    arguments: Arguments,
+    *,
+    faithful: bool,
+    symint: bool = False,
+    method: bool,
+    cpp_no_default_args: set[str],
+) -> list[Binding]:
+    args: list[Argument | TensorOptionsArguments | SelfArgument] = []
+    if faithful:
+        args.extend(arguments.non_out)
+        args.extend(arguments.out)
+    else:
+        args.extend(arguments.out)
+        args.extend(arguments.non_out)
+    return [
+        r.no_default() if faithful else r
+        for a in args
+        for r in argument(
+            a,
+            faithful=faithful,
+            symint=symint,
+            method=method,
+            has_tensor_options=arguments.tensor_options is not None,
+            cpp_no_default_args=cpp_no_default_args,
+        )
+    ]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/dispatcher.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/dispatcher.py
new file mode 100644
index 0000000000000000000000000000000000000000..fcca7a60fec1829c5783197055733467fcdd63fe
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/dispatcher.py
@@ -0,0 +1,125 @@
+from __future__ import annotations
+
+import itertools
+from typing import TYPE_CHECKING
+from typing_extensions import assert_never
+
+from torchgen.api import cpp
+from torchgen.api.types import ArgName, Binding, CType, NamedCType
+from torchgen.model import (
+    Argument,
+    FunctionSchema,
+    Return,
+    SelfArgument,
+    TensorOptionsArguments,
+    Type,
+)
+from torchgen.utils import concatMap
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+
+# This file describes the translation of JIT schema to the dispatcher
+# API, the *unboxed* calling convention by which invocations through
+# the dispatcher are made.  Historically, the dispatcher API matched
+# the C++ API, but with the establishment of the boxed API, we've
+# made changes to the dispatcher API to so that the unboxed API
+# better aligns with the boxed API.  The dispatcher API hooks heavily
+# into our template based boxing/unboxing machinery, so changes
+# to this convention will usually need template updates too.
+#
+# Prominent characteristics of the dispatcher API:
+#
+#   - dtype, layout, device and pin_memory are represented as separate
+#     arguments.
+#
+
+
+def name(func: FunctionSchema) -> str:
+    return cpp.name(func)
+
+
+def argumenttype_type(
+    t: Type,
+    *,
+    mutable: bool,
+    binds: ArgName,
+    remove_non_owning_ref_types: bool = False,
+    symint: bool = True,
+) -> NamedCType:
+    # This is a faux amis.  If it makes sense in the future to add
+    # more special cases here, or invert things so cpp.argument_type
+    # calls this, or just completely inline the function, please do
+    # it.
+    return cpp.argumenttype_type(
+        t,
+        mutable=mutable,
+        binds=binds,
+        symint=symint,
+        remove_non_owning_ref_types=remove_non_owning_ref_types,
+    )
+
+
+def argument_type(
+    a: Argument,
+    *,
+    binds: ArgName,
+    remove_non_owning_ref_types: bool = False,
+    symint: bool = True,
+) -> NamedCType:
+    return argumenttype_type(
+        a.type,
+        mutable=a.is_write,
+        binds=binds,
+        remove_non_owning_ref_types=remove_non_owning_ref_types,
+        symint=symint,
+    )
+
+
+def returns_type(rs: Sequence[Return], *, symint: bool = True) -> CType:
+    # At present, there is no difference. But there could be!
+    return cpp.returns_type(rs, symint=symint)
+
+
+def jit_arguments(func: FunctionSchema) -> list[Argument]:
+    def to_argument(
+        a: Argument | TensorOptionsArguments | SelfArgument,
+    ) -> list[Argument]:
+        if isinstance(a, Argument):
+            return [a]
+        elif isinstance(a, SelfArgument):
+            return [a.argument]
+        elif isinstance(a, TensorOptionsArguments):
+            return [a.dtype, a.layout, a.device, a.pin_memory]
+        else:
+            assert_never(a)
+
+    return list(
+        concatMap(
+            to_argument,
+            itertools.chain(
+                func.arguments.positional, func.arguments.kwarg_only, func.arguments.out
+            ),
+        )
+    )
+
+
+def argument(
+    a: Argument, *, remove_non_owning_ref_types: bool = False, symint: bool = True
+) -> Binding:
+    return Binding(
+        nctype=argument_type(
+            a,
+            binds=a.name,
+            remove_non_owning_ref_types=remove_non_owning_ref_types,
+            symint=symint,
+        ),
+        name=a.name,
+        argument=a,
+    )
+
+
+def arguments(func: FunctionSchema, *, symint: bool = True) -> list[Binding]:
+    return [argument(a, symint=symint) for a in jit_arguments(func)]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/functionalization.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/functionalization.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4b46b5f14760b2eca447536a1795ade807f89d5
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/functionalization.py
@@ -0,0 +1,215 @@
+from __future__ import annotations
+
+from torchgen.api import dispatcher
+from torchgen.api.types import (
+    BaseCppType,
+    BaseCType,
+    Binding,
+    boolT,
+    ConstRefCType,
+    CType,
+    longT,
+    NamedCType,
+    tensorT,
+)
+from torchgen.model import (
+    Argument,
+    BaseTy,
+    BaseType,
+    FunctionSchema,
+    NativeFunction,
+    NativeFunctionsViewGroup,
+)
+
+
+# This file describes the translation of JIT schema to API's used
+# when creating `ViewMeta` specializations that are used by the functionalization pass.
+# These API's mostly follow the dispatcher API, with one difference:
+# - While the forward function just directly calls into the at::_ops API
+#   (following the dispatcher convention), the logic here for the reverse function
+#   is responsible for generating both the call-site, and the declarations
+#   (which are implemented manually in the at::functionalization::impl namespace).
+
+# Define some specific lambda input arguments.
+base_binding = Binding(
+    name="base",
+    nctype=NamedCType(name="base", type=ConstRefCType(BaseCType(tensorT))),
+    argument=Argument(
+        name="base", type=BaseType(BaseTy.Tensor), default=None, annotation=None
+    ),
+    default=None,
+)
+
+has_symbolic_inputs_binding = Binding(
+    name="has_symbolic_inputs",
+    nctype=NamedCType(name="has_symbolic_inputs", type=BaseCType(boolT)),
+    argument=Argument(
+        name="has_symbolic_inputs",
+        type=BaseType(BaseTy.bool),
+        default=None,
+        annotation=None,
+    ),
+    default=None,
+)
+mutated_view_binding = Binding(
+    name="mutated_view",
+    nctype=NamedCType(name="mutated_view", type=ConstRefCType(BaseCType(tensorT))),
+    argument=Argument(
+        name="base", type=BaseType(BaseTy.Tensor), default=None, annotation=None
+    ),
+    default=None,
+)
+out_index_binding = Binding(
+    name="out_index",
+    nctype=NamedCType(name="out_index", type=BaseCType(longT)),
+    argument=Argument(
+        name="out_index", type=BaseType(BaseTy.int), default=None, annotation=None
+    ),
+    default=None,
+)
+reapply_views_binding = Binding(
+    name="reapply_views",
+    nctype=NamedCType(name="reapply_views", type=BaseCType(boolT)),
+    argument=Argument(
+        name="reapply_views", type=BaseType(BaseTy.bool), default=None, annotation=None
+    ),
+    default=None,
+)
+
+InverseReturnModeT = BaseCppType("at::functionalization", "InverseReturnMode")
+inverse_return_mode_binding = Binding(
+    name="inverse_return_mode",
+    nctype=NamedCType(name="inverse_return_mode", type=BaseCType(InverseReturnModeT)),
+    argument=Argument(
+        name="inverse_return_mode",
+        # NB: not actually a bool but it doesn't matter because this isn't used
+        type=BaseType(BaseTy.bool),
+        default=None,
+        annotation=None,
+    ),
+    default=None,
+)
+
+
+# Name of the `ViewMeta` specialization class created.
+def classname(func: FunctionSchema, with_namespace: bool = False) -> str:
+    namespace = "at::functionalization::" if with_namespace else ""
+    return f"{namespace}{func.name.unambiguous_name()}_ViewMeta"
+
+
+# Name of the operation called inside the `forward`/`reverse` implementations.
+def name(
+    g: NativeFunctionsViewGroup,
+    *,
+    is_reverse: bool,
+    include_namespace: bool,
+    reapply_views: bool | None = None,
+) -> str:
+    if reapply_views is None:
+        # reapply_views is only important for the fwd lambda,
+        # since we always plumb the runtime "reapply_views" argument into the reverse function.
+        assert is_reverse
+    if is_reverse:
+        return reverse_name(g.view, include_namespace)
+    # in the forward case, we just directly call into the at::_ops API (so we always need the namespace)
+    assert include_namespace
+    assert g.view_copy is not None
+    api_name = (
+        g.view.func.name.unambiguous_name()
+        if reapply_views
+        else g.view_copy.func.name.unambiguous_name()
+    )
+    return f"at::_ops::{api_name}::call"
+
+
+def reverse_name(f: NativeFunction, include_namespace: bool) -> str:
+    # for the reverse: we plumb the "reapply_views" flag into that function and support
+    # both copy and non-copy variants. (We could avoid doing that, but that would require
+    # writing out twice as many view inverse functions).
+    api_name = f.func.name.unambiguous_name()
+    # in the reverse case, we codegen both the call-sites (which need the full namespace) and the declarations (which don't)
+    if include_namespace:
+        return f"at::functionalization::FunctionalInverses::{api_name}_inverse"
+    else:
+        return f"{api_name}_inverse"
+
+
+def returns_type(func: FunctionSchema) -> CType:
+    # Assertion: all view ops return tensor-like outputs
+    assert len(func.returns) >= 1
+    for ret in func.returns:
+        assert ret.type.is_tensor_like()
+    # However, the return type of the lambda is always an individual tensor.
+    # For multi-tensor outputs, each tensor needs to be tracked individually.
+    return BaseCType(tensorT)
+
+
+# Checks whether `func` might return more than one value.
+def is_multi_output(func: FunctionSchema) -> bool:
+    return len(func.returns) > 1 or (
+        len(func.returns) == 1 and func.returns[0].type.is_list_like() is not None
+    )
+
+
+# `ViewMeta` specialization constructor parameters.
+def base_ctor_arguments(func: FunctionSchema) -> list[Binding]:
+    # All specializations are parematerized by `has_symbolic_inputs` flag.
+    arguments = [has_symbolic_inputs_binding]
+
+    # If `func` might return more than 1 value, we also parameterize this specialization
+    # with the output index.
+    if is_multi_output(func):
+        arguments.append(out_index_binding)
+
+    return arguments
+
+
+# `ViewMeta` specialized class' constructor arguments.
+#
+# Values needed specifically by this specialization, that the base class does not need.
+# Same as the class' attributes, but non-owning.
+def extra_ctor_arguments(func: FunctionSchema) -> list[Binding]:
+    return attributes(func, owning=False)
+
+
+# `ViewMeta` specialized class' non-static member data.
+#
+# Essential data for calling the instance's `forward` and `reverse functions. You can
+# think of them as values that should be captured from the functionalization kernel.
+def attributes(func: FunctionSchema, owning: bool = True) -> list[Binding]:
+    args = func.arguments.flat_all
+    assert args[0].type == BaseType(BaseTy.Tensor)
+    return [
+        reapply_views_binding,
+        inverse_return_mode_binding,
+        *[dispatcher.argument(a, remove_non_owning_ref_types=owning) for a in args[1:]],
+    ]
+
+
+def op_arguments(func: FunctionSchema, is_reverse: bool) -> list[Binding]:
+    args = func.arguments.flat_all
+    assert args[0].type == BaseType(BaseTy.Tensor)
+    non_self_args = args[1:]
+    # The forward lambda calls the at::_ops API, while the reverse lambda calls the view inverse API.
+    # Both of these follow the dispatcher API.
+    non_self_bindings = [dispatcher.argument(a) for a in non_self_args]
+    if not is_reverse:
+        # the forward lambda swaps out the original tensor argument with the lambd arg "base"
+        return [base_binding] + non_self_bindings
+    else:
+        # the reverse lambda does the same, but with an additional "mutated_view" arg
+        # additionally, we have a calling convention: for view ops that return multiple tensor outputs
+        # their corresponding view_inverse function takes in an additional index argument.
+        if is_multi_output(func):
+            return [
+                base_binding,
+                mutated_view_binding,
+                inverse_return_mode_binding,
+                out_index_binding,
+            ] + non_self_bindings
+        else:
+            return [
+                base_binding,
+                mutated_view_binding,
+                inverse_return_mode_binding,
+            ] + non_self_bindings
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/lazy.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/lazy.py
new file mode 100644
index 0000000000000000000000000000000000000000..1d308afd8136a4e4d3c0b5eb1b89fcbd00c0a5c5
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/lazy.py
@@ -0,0 +1,468 @@
+from __future__ import annotations
+
+from typing import Any
+
+from torchgen.api.types import (
+    BaseCppType,
+    BaseCType,
+    boolT,
+    CType,
+    deviceT,
+    doubleT,
+    generatorT,
+    layoutT,
+    ListCType,
+    longT,
+    memoryFormatT,
+    NamedCType,
+    OptionalCType,
+    scalarT,
+    scalarTypeT,
+    stringT,
+    SymIntT,
+    VectorCType,
+)
+from torchgen.model import (
+    Argument,
+    BaseTy,
+    BaseType,
+    FunctionSchema,
+    ListType,
+    OperatorName,
+    OptionalType,
+    Return,
+    TensorOptionsArguments,
+    Type,
+)
+
+
+_valueT: BaseCppType | None = None
+
+
+# A ValueT is an IR type which represents the computation of a Tensor.  In other
+# words, a PyTorch user will do operations on lazy tensors, and each output lazy
+# tensor internally tracks a ValueT representing the IR node that would have
+# actually produced the value of this tensor for real.
+#
+# This is configurable because different lazy tensor backends (LTC vs XLA) will
+# have different IR representations.  (Though, arguably, after unification they
+# shouldn't!)
+def getValueT() -> BaseCppType:
+    global _valueT
+    if not _valueT:
+        raise NotImplementedError(
+            "The value type needs to be set with setValueT() in run_gen_lazy_tensor()"
+        )
+
+    return _valueT
+
+
+def setValueT(val: BaseCppType) -> None:
+    global _valueT
+    _valueT = val
+
+
+# this is a bad hack. I need to refactor the data model to represent each arg in the schema as an object,
+# making it easier to represent special properties of an arg.
+tensorListValueT = BaseCppType("torch::lazy", "Value")
+
+
+def process_ir_type(
+    typ: Type, properties: LazyIrProperties, *, symint: bool
+) -> BaseCType | VectorCType | OptionalCType | ListCType:
+    """
+    This function takes a type from NativeFunctions and converts it for use with
+    lazy tensor codegen.
+
+    Type conversion for lazy currently consists of
+     (1) changing at::Tensors into lazy::Values
+     (2) wrapping everything in a BaseCType
+     (3) making cpp-reference types into cpp-value types (e.g. vector instead of IntArrayRef)
+
+    (1) converts at::Tensors to lazy::Values (which wrap lazy::Nodes, with which Lazy IR represents tensors.)
+    There is special handling for Optional[Tensor] or list[Tensor], etc- hence 'tensor-like'
+
+    This is incomplete- there are assertions in places that it's expected to need to add
+    more types as the codegen is used with more operators.
+    """
+    if isinstance(typ, BaseType):
+        if typ.name == BaseTy.Tensor:
+            return BaseCType(getValueT())
+        elif typ.name == BaseTy.Scalar:
+            if properties.TreatScalarsAsConstants:
+                return BaseCType(scalarT)
+            # at::scalar has special handling,
+            # and is wrapped in an lazy::Value just like at::tensor
+            return BaseCType(getValueT())
+        elif typ.name == BaseTy.ScalarType:
+            return BaseCType(scalarTypeT)
+        elif typ.name == BaseTy.int:
+            return BaseCType(longT)
+        elif typ.name == BaseTy.SymInt:
+            if symint:
+                return BaseCType(getValueT())
+            else:
+                return BaseCType(longT)
+        elif typ.name == BaseTy.bool:
+            return BaseCType(boolT)
+        elif typ.name == BaseTy.float:
+            return BaseCType(doubleT)
+        elif typ.name == BaseTy.str:
+            return BaseCType(stringT)
+        elif typ.name == BaseTy.Device:
+            return BaseCType(deviceT)
+        elif typ.name == BaseTy.Generator:
+            return BaseCType(generatorT)
+        elif typ.name == BaseTy.Layout:
+            return BaseCType(layoutT)
+        elif typ.name == BaseTy.MemoryFormat:
+            return BaseCType(memoryFormatT)
+        else:
+            raise AssertionError(f"TODO add support for type {repr(typ)}")
+    elif isinstance(typ, OptionalType):
+        return OptionalCType(process_ir_type(typ.elem, properties, symint=symint))
+    elif isinstance(typ, ListType):
+        if str(typ.elem) == "Tensor?":
+            # TODO(whc) is this actually correct? or should it use a Vector like above
+            return ListCType(OptionalCType(BaseCType(getValueT())))
+        elif str(typ.elem) == "Tensor":
+            # this is a TensorList which comes in from GetTensorList as a Value
+            return BaseCType(tensorListValueT)
+        elif typ.elem == BaseType(BaseTy.SymInt):
+            # TODO: return a value type.  The problem here is analogous to
+            # the problem with tensorListValueT: if you have SymInt[] you
+            # cannot conveniently save the list of Value directly, as nodes
+            # expect to save values as a vector for ALL arguments.  So you
+            # need a separate IR node that represents all of the size nodes
+            # assembled into a list.  I'm not an LTC dev so I don't want to
+            # figure it out right now.  Y'all figure it out...
+            return VectorCType(BaseCType(longT))
+
+        else:
+            return VectorCType(process_ir_type(typ.elem, properties, symint=symint))
+    else:
+        raise AssertionError(f"unrecognized type {repr(typ)}")
+
+
+# TODO: Determining this based off of CType is bad; this should be computed
+# from Type directly; then the same logic as process_ir_type can be used
+#
+# Invariant: passed typ should be an *owning* CType (e.g., we will report
+# that ArrayRef<Value> is NOT a value type)
+def isValueType(typ: CType, properties: LazyIrProperties | None = None) -> bool:
+    """
+    Given a type, determine if it is a Value-like type.  This is equivalent to
+    being Tensor-like, but assumes the type has already been transformed.
+    """
+    if isinstance(typ, BaseCType):
+        # I am regretting my naming conventions, but now we are wrapping at::scalar in
+        # lazy value, while preserving other 'scalar' types as scalars in the IR
+        treat_scalars_as_constants = properties and properties.TreatScalarsAsConstants
+        return (
+            typ.type == getValueT()
+            or (typ.type == scalarT and not treat_scalars_as_constants)
+            or typ.type == SymIntT
+        )
+    elif typ == VectorCType(BaseCType(SymIntT)):
+        # TODO: report True for this
+        return False
+    elif isinstance(typ, (OptionalCType, ListCType, VectorCType)):
+        return isValueType(typ.elem, properties)
+    return False
+
+
+def isSymIntType(typ: Type) -> bool:
+    return isinstance(typ, BaseType) and typ.name == BaseTy.SymInt
+
+
+def isWrappedScalarType(typ: Type) -> bool:
+    """
+    Given a type, determine if it is a c10::scalar which we will wrap in a lazy Value.
+    Since we literally change the type from scalarT to valueT, information is lost.
+    This function helps build a list of wrapped scalars to save that information
+    """
+    if isinstance(typ, BaseType):
+        # I am regretting my naming conventions, but now we are wrapping at::scalar in
+        # lazy value, while preserving other 'scalar' types as scalars in the IR
+        return typ.name == BaseTy.Scalar
+    elif isinstance(typ, (OptionalType, ListType)):
+        return isWrappedScalarType(typ.elem)
+    return False
+
+
+# TODO: dedupe with Type.is_generator_like
+def isGeneratorType(typ: Type) -> bool:
+    if isinstance(typ, BaseType):
+        return typ.name == BaseTy.Generator
+    elif isinstance(typ, (OptionalType)):
+        return isGeneratorType(typ.elem)
+    return False
+
+
+# This class caches a few derived properties computed from an Argument
+# and LazyIrProperties
+class LazyArgument:
+    name: str
+    orig_type: Type
+    lazy_type_: CType | None
+    is_wrapped_scalar: bool
+    is_generator: bool
+    # TODO: this is lies, it is false for symint list
+    is_symint_or_list: bool
+
+    # Whether or not we are treating this as symint or not
+    symint: bool
+
+    # true if this argument is or contains a lazy IR value
+    is_lazy_value: bool
+
+    def __init__(
+        self, arg: Argument, properties: LazyIrProperties, *, symint: bool
+    ) -> None:
+        self.name = arg.name
+        self.orig_type = arg.type
+        self.symint = symint
+        self.is_optional = isinstance(arg.type, OptionalType)
+        self.is_generator = isGeneratorType(arg.type)
+        self.lazy_type_ = process_ir_type(arg.type, properties, symint=symint)
+        self.is_wrapped_scalar = isWrappedScalarType(arg.type)
+        self.is_symint_or_list = symint and (
+            isSymIntType(arg.type)
+            or (isinstance(arg.type, OptionalType) and isSymIntType(arg.type.elem))
+            # TODO: lists of symints are not currently treated as value types
+            # or (isinstance(arg.type, ListType) and isSymIntType(arg.type.elem))
+        )
+
+        self.is_lazy_value = isValueType(self.lazy_type, properties)
+
+    @property
+    def lazy_type(self) -> CType:
+        assert self.lazy_type_ is not None, (
+            f"Attempted to access lazy_type for invalid argument {self.name}"
+        )
+        return self.lazy_type_
+
+
+class LazyIrProperties:
+    """Collection of properties for an IR node
+
+    The property groups are listed below. Each group is mutually
+    exclusive, meaning that only one property from each group can be True
+    at any one time. The properties can be accessed as if they were normal
+    attributes. The mutual exclusivity is automatically handled.
+    """
+
+    Properties: tuple[tuple[str, ...], ...] = (
+        (
+            "ShapePrecompute",  # Assume shape has been precomputed
+            "ShapeCompute",  # Need to compute the shape on construction
+            "ShapeCache",  # Utilize the shape cache to defer computation
+        ),
+        (
+            "Lower",  # Codegen full lower function
+            "LowerDeclOnly",  # Codegen only lower function declaration
+        ),
+        (
+            "CanBeReused",  # Codegen full reuse function
+            "CanBeReusedDeclOnly",  # Codegen only reuse function declaration
+        ),
+        (
+            "CreateFn",  # Codegen full create function
+            "CreateFnDeclOnly",  # Codegen only create function declaration
+        ),
+        (
+            "TreatScalarsAsConstants",  # Treat Scalars as constants instead of handling like values
+        ),
+    )
+
+    def __init__(self, *default_properties: str) -> None:
+        properties: dict[tuple[str, ...], str | None] = dict.fromkeys(
+            LazyIrProperties.Properties
+        )
+        self.__dict__["properties"] = properties
+        for p in default_properties:
+            setattr(self, p, True)
+
+    def __getattr__(self, key: str) -> Any:
+        properties = self.__dict__["properties"]
+        for values in LazyIrProperties.Properties:
+            if key in values:
+                return properties[values] == key
+
+        return self.__getattribute__(key)
+
+    def __setattr__(self, key: str, value: Any) -> Any:
+        properties = self.__dict__["properties"]
+        for values in LazyIrProperties.Properties:
+            if key in values:
+                properties[values] = key if value else None
+                return value
+
+        raise KeyError(f"Invalid property: {key}")
+
+
+# Inspired by a FunctionSchema object, a LazyIrSchema holds the schema of a Lazy IR node.
+# Unlike a FunctionSchema, it has no round-trippable string form (relating to the YAML),
+# but carries type information from a native FunctionSchema modified for use with IR nodes,
+# and preserving original argument names.
+#
+# TODO: This is not idiomatic with how other torchgen APIs transform on schema.
+class LazyIrSchema:
+    # The name of the operator this function schema describes.
+    name: OperatorName
+
+    positional_args: tuple[LazyArgument, ...]
+    keyword_args: tuple[LazyArgument, ...]
+
+    # TODO: Need to handle collisions with argument names at some point
+    returns: tuple[Return, ...]
+
+    # if this schema has a Generator arg, list its orig ctype/name but don't
+    # build a LazyArgument since lazy IR doesn't support it
+    generator_arg: NamedCType | None = None
+
+    # original function schema
+    func: FunctionSchema
+
+    # Whether or not we are code-genning for SymInt or not
+    symint: bool
+
+    properties: LazyIrProperties = LazyIrProperties(
+        # default properties
+        "ShapePrecompute",
+        "Lower",
+        "CanBeReused",
+    )
+    opkind: str | None = None
+
+    def __init__(
+        self,
+        func: FunctionSchema,
+        properties: LazyIrProperties | None = None,
+        *,
+        symint: bool,
+    ) -> None:
+        if properties:
+            self.properties = properties
+
+        self.func = func
+        self.symint = symint
+        positional_args: list[LazyArgument] = []
+        for arg_field in ["pre_self_positional", "self_arg", "post_self_positional"]:
+            if arg_field == "self_arg" and func.arguments.self_arg is not None:
+                arg = func.arguments.self_arg.argument
+                positional_args.append(
+                    LazyArgument(arg, self.properties, symint=symint)
+                )
+            elif getattr(func.arguments, arg_field) is not None:
+                positional_args.extend(
+                    LazyArgument(arg, self.properties, symint=symint)
+                    for arg in getattr(func.arguments, arg_field)
+                )
+        self.positional_args = tuple(positional_args)
+
+        keyword_args: list[LazyArgument] = []
+        for arg_field in [
+            "pre_tensor_options_kwarg_only",
+            "tensor_options",
+            "post_tensor_options_kwarg_only",
+            "out",
+        ]:
+            curr_args = getattr(func.arguments, arg_field)
+            if curr_args is not None:
+                if isinstance(curr_args, TensorOptionsArguments):
+                    curr_args = curr_args.all()
+                for arg in curr_args:
+                    if isGeneratorType(arg.type):
+                        assert self.generator_arg is None, (
+                            "We expect there is only one generator arg"
+                        )
+                        self.generator_arg = NamedCType(
+                            arg.name,
+                            arg.type,  # type:ignore[arg-type]
+                        )
+                keyword_args.extend(
+                    LazyArgument(arg, self.properties, symint=symint)
+                    for arg in curr_args
+                )
+        self.keyword_args = tuple(keyword_args)
+        self.name = func.name
+        self.returns = func.returns
+
+    @property
+    def node_name(self) -> str:
+        """
+        Return camel-case version of op in node.
+
+        Note: This function also appends any `overload_name` in the operation.
+        For example, if the op is `bitwise_and.Tensor`, the returned name
+        will be `BitwiseAndTensor`.
+        """
+        op_name = f"{self.name.name}_{self.name.overload_name}".lower()
+        return "".join(word.capitalize() or "" for word in op_name.split("_"))
+
+    @property
+    def aten_name(self) -> str:
+        return str(self.name.name)
+
+    @property
+    def base_name(self) -> str:
+        return f"{self.name.name.base}"
+
+    def filtered_args(
+        self,
+        positional: bool = True,
+        keyword: bool = True,
+        values: bool = True,
+        scalars: bool = True,
+        generator: bool = True,
+    ) -> list[LazyArgument]:
+        # This function maintains the sorted order of arguments but provides different filtered views.
+        # Some parts of the code care about kwargs vs args (TS lowerings),
+        # other parts care about whether they need to wrap the arg in a lazy value or leave it alone.
+        # Generators are special cased, as they are needed for fallback/shape-inference but not supported
+        # in TS lowerings and therefore also omitted from lazy IR.
+        args: list[LazyArgument] = []
+        if positional:
+            args.extend(self.positional_args)
+        if keyword:
+            args.extend(self.keyword_args)
+
+        if values and scalars and generator:
+            return args
+        elif values and scalars:
+            return [a for a in args if not a.is_generator]
+        elif values:
+            return [a for a in args if a.is_lazy_value]
+        elif scalars:
+            return [
+                a
+                for a in args
+                if not a.is_lazy_value and (generator or not a.is_generator)
+            ]
+
+        return []
+
+    @property
+    def positional_values(self) -> list[LazyArgument]:
+        return self.filtered_args(
+            positional=True, keyword=False, values=True, scalars=False
+        )
+
+    @property
+    def positional_scalars(self) -> list[LazyArgument]:
+        return self.filtered_args(
+            positional=True, keyword=False, values=False, scalars=True
+        )
+
+    @property
+    def keyword_values(self) -> list[LazyArgument]:
+        return self.filtered_args(
+            positional=False, keyword=True, values=True, scalars=False
+        )
+
+    @property
+    def keyword_scalars(self) -> list[LazyArgument]:
+        return self.filtered_args(
+            positional=False, keyword=True, values=False, scalars=True
+        )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/meta.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/meta.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e99d151faeaccea7ca47f372fd26f9985ce7249
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/meta.py
@@ -0,0 +1,13 @@
+from torchgen.model import NativeFunctionsGroup
+
+
+# Follows dispatcher calling convention, but:
+#   - Mutable arguments not allowed.  Meta functions are always
+#     written in functional form.  Look at FunctionSchema.signature()
+#   - No tensor returns; instead we return a TensorMeta describing
+#     the tensor in question
+
+
+def name(g: NativeFunctionsGroup) -> str:
+    # use the overload name from the functional version
+    return str(g.functional.func.name).replace(".", "_")
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/native.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/native.py
new file mode 100644
index 0000000000000000000000000000000000000000..632216704d2d47606b977d487335ca196e2e1842
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/native.py
@@ -0,0 +1,159 @@
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+from typing_extensions import assert_never
+
+from torchgen import local
+from torchgen.api import cpp
+from torchgen.api.types import (
+    ArgName,
+    BaseCType,
+    Binding,
+    boolT,
+    ConstRefCType,
+    CType,
+    deviceT,
+    layoutT,
+    ListCType,
+    MutRefCType,
+    NamedCType,
+    OptionalCType,
+    scalarT,
+    scalarTypeT,
+    tensorT,
+)
+from torchgen.model import (
+    Argument,
+    FunctionSchema,
+    Return,
+    SelfArgument,
+    TensorOptionsArguments,
+    Type,
+)
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+
+# This file describes the translation of JIT schema to the native functions API.
+# This looks a lot like the C++ API (which makes historical sense, because the
+# idea was you wrote native functions to implement functions in the C++ API),
+# but over time we have evolved the C++ API without actually changing our
+# native:: kernels.  The intention is to make native API and dispatcher API
+# line up as closely as possible, since this results in the least overhead
+# (no translation is needed from dispatcher API to native API).
+#
+# NB: this is symint aware, you will get the non-SymInt variant for some
+# dispatch entries and SymInt for others.
+
+
+def name(func: FunctionSchema) -> str:
+    name = str(func.name.name)
+    # TODO: delete this!
+    if func.is_out_fn():
+        name += "_out"
+    if func.name.overload_name:
+        name += f"_{func.name.overload_name}"
+    return name
+
+
+def argumenttype_type(
+    t: Type, *, mutable: bool, binds: ArgName, symint: bool
+) -> NamedCType:
+    if str(t) == "Tensor?":
+        tensor_type: OptionalCType = OptionalCType(BaseCType(tensorT))
+        if mutable and not local.use_const_ref_for_mutable_tensors():
+            return NamedCType(binds, MutRefCType(tensor_type))
+        else:
+            return NamedCType(binds, ConstRefCType(tensor_type))
+    elif str(t) == "Tensor?[]":
+        return NamedCType(
+            binds, ConstRefCType(ListCType(OptionalCType(BaseCType(tensorT))))
+        )
+    elif str(t) == "Scalar":
+        return NamedCType(binds, ConstRefCType(BaseCType(scalarT)))
+    elif str(t) == "Scalar?":
+        return NamedCType(binds, ConstRefCType(OptionalCType(BaseCType(scalarT))))
+    return cpp.argumenttype_type(t, mutable=mutable, binds=binds, symint=symint)
+
+
+def returns_type(rs: Sequence[Return], *, symint: bool) -> CType:
+    return cpp.returns_type(rs, symint=symint)
+
+
+def argument_type(a: Argument, *, binds: ArgName, symint: bool) -> NamedCType:
+    return argumenttype_type(a.type, mutable=a.is_write, binds=binds, symint=symint)
+
+
+def argument(
+    a: Argument | SelfArgument | TensorOptionsArguments,
+    *,
+    is_out: bool,
+    symint: bool,
+) -> list[Binding]:
+    # Ideally, we NEVER default native functions.  However, there are a number
+    # of functions that call native:: directly and rely on the defaulting
+    # existing.  So for BC, we generate defaults for non-out variants (but not
+    # for out variants, where it is impossible to generate an appropriate
+    # default)
+    should_default = not is_out
+    if isinstance(a, Argument):
+        default: str | None = None
+        if should_default and a.default is not None:
+            default = cpp.default_expr(a.default, a.type, symint=symint)
+        return [
+            Binding(
+                nctype=argument_type(a, binds=a.name, symint=symint),
+                name=a.name,
+                default=default,
+                argument=a,
+            )
+        ]
+    elif isinstance(a, SelfArgument):
+        # Erase SelfArgument from the distinction
+        return argument(a.argument, is_out=is_out, symint=symint)
+    elif isinstance(a, TensorOptionsArguments):
+        default = None
+        if should_default:
+            default = "{}"
+        # TODO: Not sure why the arguments assigned here are for
+        # TensorOptionsArguments and not the constituent pieces.  It seems
+        # to matter
+        return [
+            Binding(
+                nctype=NamedCType("dtype", OptionalCType(BaseCType(scalarTypeT))),
+                name="dtype",
+                default=default,
+                argument=a,
+            ),
+            Binding(
+                nctype=NamedCType("layout", OptionalCType(BaseCType(layoutT))),
+                name="layout",
+                default=default,
+                argument=a,
+            ),
+            Binding(
+                nctype=NamedCType("device", OptionalCType(BaseCType(deviceT))),
+                name="device",
+                default=default,
+                argument=a,
+            ),
+            Binding(
+                nctype=NamedCType("pin_memory", OptionalCType(BaseCType(boolT))),
+                name="pin_memory",
+                default=default,
+                argument=a,
+            ),
+        ]
+    else:
+        assert_never(a)
+
+
+def arguments(func: FunctionSchema, *, symint: bool) -> list[Binding]:
+    args: list[Argument | TensorOptionsArguments | SelfArgument] = []
+    args.extend(func.arguments.non_out)
+    args.extend(func.arguments.out)
+    return [
+        r for arg in args for r in argument(arg, symint=symint, is_out=func.is_out_fn())
+    ]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/python.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/python.py
new file mode 100644
index 0000000000000000000000000000000000000000..dbfa73060163057e979d231c06f63bb29ea87daa
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/python.py
@@ -0,0 +1,1548 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import TYPE_CHECKING
+
+from torchgen.api import cpp
+from torchgen.api.types import Binding, CppSignature, CppSignatureGroup
+from torchgen.gen import pythonify_default
+from torchgen.model import (
+    Argument,
+    BaseTy,
+    BaseType,
+    FunctionSchema,
+    ListType,
+    NativeFunction,
+    OptionalType,
+    Return,
+    Type,
+    Variant,
+)
+
+
+if TYPE_CHECKING:
+    from collections.abc import Iterable, Sequence
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                           Data Models
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+# [Notes] python binding codegen
+#
+# The Python binding codegen produces code that takes the input list of
+# PyObjects, finds the matching ATen C++ function using PythonArgParser,
+# converts the PyObjects into C++ types and calls the ATen C++ function:
+#
+# +--------+  parsing   +------------------------+  binding   +-----------------------+
+# | PyObjs | ---------> | PythonArgParser Output | ---------> | Cpp Function Dispatch |
+# +--------+            +------------------------+            +-----------------------+
+#
+# The following examples demonstrate the data models the Python binding
+# codegen needs to deal with and the tasks it needs to accomplish. It
+# helps understand the purpose of the new data types we introduced below.
+#
+#  - Function Schema (source of truth)
+#
+#      aten::empty.names(int[] size, *, Dimname[]? names,
+#                        ScalarType? dtype=None, Layout? layout=None,
+#                        Device? device=None, bool? pin_memory=None,
+#                        MemoryFormat? memory_format=None) -> Tensor
+#
+#  - Python Signature
+#
+#    It's used to generate input schema string for PythonArgParser.
+#    Note: TensorOptions fields are reordered and the additional
+#    'requires_grad' field is added:
+#
+#      empty(IntArrayRef size, *, DimnameList? names,
+#            MemoryFormat? memory_format=None, ScalarType dtype=None,
+#            Layout layout=torch.strided, Device device=None,
+#            bool pin_memory=False, bool requires_grad=False)
+#
+#  - C++ Signature
+#
+#    It's used to generate C++ lambda formals & dispatch call.
+#    Note: the scattered TensorOptions fields are packed into 'options'.
+#
+#      auto dispatch_empty =
+#          [](IntArrayRef size, std::optional<DimnameList> names,
+#             const TensorOptions & options,
+#             std::optional<MemoryFormat> memory_format) -> Tensor {
+#          pybind11::gil_scoped_release no_gil;
+#          return torch::empty(size, names, options, memory_format);
+#      };
+#
+#  - Binding between Python Arguments and C++ Arguments
+#
+#    Given a set of Python Arguments in scope, we need produce the
+#    binding expressions that translate the Python API into C++ API:
+#
+#            Python Args               Cpp Args       Binding Exprs
+#     -----------------------------------------------------------------
+#         0: size                      size           '_r.intlist(0)'
+#         1: names                     names          'names' [special init]
+#         2: memory_format -------+
+#         3: dtype         -----+-|--> options        'options' [special packing]
+#         4: layout            /  |
+#         5: device           /   +--> memory_format  '_r.memoryformatOptional(2)'
+#         6: pin_memory      /
+#         7: requires_grad -+
+#
+#    So the full dispatch expression would look like:
+#
+#      dispatch_empty(_r.intlist(0), names, options,
+#                     _r.memoryformatOptional(2))
+#
+#    Where does 'names' come from? It involves special local init:
+#
+#      auto __names = _r.toDimnameListOptional(1);
+#      std::optional<DimnameList> names =
+#          __names ? std::make_optional(DimnameList(__names.value()))
+#                  : std::nullopt;
+#
+#    Where does 'options' come from? It involves special local init
+#    for TensorOptions. Note that Python side has the additional
+#    'requires_grad' field:
+#
+#      const auto options = TensorOptions()
+#          .dtype(_r.scalartype(3))
+#          .device(_r.device(5))
+#          .layout(_r.layoutOptional(4))
+#          .requires_grad(_r.toBool(7))
+#          .pinned_memory(_r.toBool(6));
+#
+#    In some other cases one Python Argument can map to multiple C++
+#    Arguments. For example:
+#
+#     aten::max.names_dim(Tensor self, Dimname dim, bool keepdim=False)
+#       -> (Tensor values, Tensor indices)
+#
+#            Python Args               Cpp Args          Binding Exprs
+#     ---------------------------------------------------------------------
+#                               +----> max               'out[0]'
+#                              /-----> max_values        'out[1]
+#         0: input            /        self              '_r.tensor(0)'
+#         1: dim             /         dim               '_r.dimname(1)'
+#         2: keepdim        /          keepdim           '_r.toBool(2)'
+#         3: out      -----+           [local init] out  '_r.tensorlist_n<2>(3)'
+#
+#    As demonstrated above, the binding can involve reordering,
+#    packing, unpacking and special local inits.
+#
+#
+#  Let's look at a concrete example:
+#
+#      static PythonArgParser parser({
+#        "abs(Tensor input, *, Tensor out=None)",
+#        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#         ^
+#         +--- Python Schema, represented by PythonSignature and PythonArgument
+#
+#      }, /*traceable=*/true);
+#
+#      ParsedArgs<2> parsed_args;
+#      auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
+#
+#      ...
+#
+#      if (_r.isNone(1)) {
+#          ~~~~~~~~~~~~  <--- Scattered PythonArgParser output (arg name = 'out')
+#                             represented by PythonArgParserOutputExpr
+#
+#        // aten::abs(Tensor self) -> Tensor
+#        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#         ^
+#         +--- NativeFunction schema, base version
+#
+#        auto dispatch_abs = [](const Tensor & self) -> Tensor {
+#                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#                             ^
+#                             +--- dispatch_lambda_args / dispatch_lambda_return_str
+#                                  generated from NativeFunction / CppSignature
+#                                  (deprecated PythonSignature is special)
+#                                  arguments are represented by DispatchLambdaArgument
+#
+#          pybind11::gil_scoped_release no_gil;
+#          return self.abs();
+#                 ~~~~~~~~~~~  <--- cpp_dispatch_target / cpp_dispatch_exprs
+#                                   generated from NativeFunction / CppSignature
+#        };
+#        return wrap(dispatch_abs(_r.tensor(0)));
+#                                 ~~~~~~~~~~~~~
+#                                  ^
+#                                  +--- dispatch_lambda_exprs
+#                                       binding PythonArgParserOutputExpr (python args)
+#                                       and DispatchLambdaArgument (c++ args)
+#
+#      } else {
+#        // aten::abs.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+#        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#         ^
+#         +--- NativeFunction schema, out-variant
+#
+#        auto dispatch_abs_out = [](Tensor out, const Tensor & self) -> Tensor {
+#          pybind11::gil_scoped_release no_gil;
+#          return at::abs_out(out, self);
+#        };
+#        return wrap(dispatch_abs_out(_r.tensor(1), _r.tensor(0)));
+#      }
+#
+#
+# [Notes] python interface codegen
+# The python dataclasses below are used used to generate both python binding code
+# and pyi type hint signatures.
+# In theory these two should look very similar, but there are number of differences
+# in how pyi signatures vs. python_arg_parser signatures are generated.
+# These differences have been encapsulated in signature_str() vs. signature_str_pyi()
+# to display the full signatures, and argument_str() vs argument_str_pyi() to display arguments.
+# For examples, only pyi signatures include return types.
+
+
+def format_function_signature(
+    name: str, arguments: Iterable[str] = (), return_type: str | None = None
+) -> str:
+    if not isinstance(arguments, (list, tuple)):
+        arguments = tuple(arguments)
+    return_type = f" -> {return_type}" if return_type is not None else ""
+
+    sig = f"def {name}({', '.join(arguments)}){return_type}: ..."
+    if len(sig) <= 80 or len(arguments) == 0 or tuple(arguments) == ("self",):
+        return sig
+
+    lines = [
+        f"def {name}(",
+        *(f"    {arg}," for arg in arguments),
+        f"){return_type}: ...",
+    ]
+    sig = "\n".join(lines)
+    if all(len(line) <= 80 for line in lines):
+        return sig
+    # ruff format bug for compound statements: https://github.com/astral-sh/ruff/issues/18658
+    # use `skip` instead of `on` + `off`
+    return sig.removesuffix(" ...") + "  # fmt: skip\n    ..."
+
+
+@dataclass(frozen=True)
+class PythonReturns:
+    returns: tuple[Return, ...]
+
+
+@dataclass(frozen=True)
+class PythonArgument:
+    name: str
+    type: Type
+    default: str | None
+
+    # Used to generate the default init expr for some PythonArgParser outputs, e.g.:
+    #
+    #   _r.layoutWithDefault(3, layout_from_backend(self.options().backend())))
+    #                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    #                            ^
+    #                            +--- default_init str
+    default_init: str | None
+
+    # Compute argument formal for python argument parsing.
+    # Needs to be consistent with torch/csrc/utils/python_arg_parser.h.
+    def argument_str(self, *, method: bool = False, symint: bool = True) -> str:
+        type_str = (
+            argument_type_str(self.type, symint=symint)
+            .replace("const ", "")
+            .replace(" &", "")
+        )
+
+        name = self.name
+        # s/self/input/ outside method bindings
+        # [old codegen] TODO: remove this? doesn't rename in codegen, it's just
+        # for the parse string
+        if name == "self" and type_str in ["Tensor", "Number"] and not method:
+            name = "input"
+
+        # add default
+        if self.default is not None:
+            default = {
+                "nullptr": "None",
+                "::std::nullopt": "None",
+                "std::nullopt": "None",
+                "{}": "None",
+            }.get(self.default, self.default)
+            return f"{type_str} {name}={default}"
+        else:
+            return f"{type_str} {name}"
+
+    def argument_str_pyi(
+        self, *, method: bool = False, deprecated: bool = False
+    ) -> str:
+        type_str = argument_type_str_pyi(self.type)
+
+        name = self.name
+        # s/self/input/ outside method bindings
+        # [old codegen] TODO: remove this? doesn't rename in codegen, it's just
+        # for the parse string
+        if name == "self" and type_str == "Tensor" and not method and not deprecated:
+            name = "input"
+
+        if name == "from":  # from is a Python keyword...
+            name += "_"
+
+        # pyi merges the _out and functional variants into the same signature, with an optional out arg
+        if name == "out" and type_str == "Tensor" and not deprecated:
+            type_str = f"{type_str} | None".replace(" | None | None", " | None")
+
+        # pyi deprecated signatures don't get defaults for their out arg
+        treat_as_no_default = (
+            deprecated
+            and isinstance(self, PythonOutArgument)
+            and self.default == "None"
+        )
+
+        # add default
+        if self.default is not None and not treat_as_no_default:
+            if (
+                isinstance(self.type, ListType)
+                and self.type.elem == BaseType(BaseTy.int)
+                and self.default.startswith("{")
+                and self.default.endswith("}")
+            ):
+                default = (
+                    "(" + ", ".join(map(str.strip, self.default[1:-1].split(","))) + ")"
+                )
+            else:
+                default = {
+                    "nullptr": "None",
+                    "::std::nullopt": "None",
+                    "std::nullopt": "None",
+                    "{}": "None",
+                    "c10::MemoryFormat::Contiguous": "contiguous_format",
+                    "QScheme::PER_TENSOR_AFFINE": "per_tensor_affine",
+                }.get(self.default, self.default)
+            return f"{name}: {type_str} = {default}"
+        else:
+            return f"{name}: {type_str}"
+
+
+@dataclass(frozen=True)
+class PythonOutArgument(PythonArgument):
+    # In Python signature multiple output fields are packed into one 'out' argument.
+    # When binding to C++, it's first binded to a local 'out' variable:
+    #   'auto out = _r.tensorlist_n<2>(2);',
+    # then binded to scattered C++ output arguments as 'out[0]', 'out[1]', and etc.
+    # TODO: maybe don't need keep scattered out fields for python signature?
+    outputs: tuple[PythonArgument, ...]
+
+    @staticmethod
+    def from_outputs(outputs: tuple[PythonArgument, ...]) -> PythonOutArgument | None:
+        if not outputs:
+            return None
+
+        size = len(outputs)
+        if size == 1:
+            return PythonOutArgument(
+                name=outputs[0].name,
+                type=outputs[0].type,
+                default="None",
+                default_init=None,
+                outputs=outputs,
+            )
+        elif size > 1:
+            if any(not a.type.is_tensor_like() for a in outputs):
+                raise RuntimeError(f"Unsupported output type: {outputs}")
+            return PythonOutArgument(
+                name="out",
+                # TODO: shouldn't this be OptionalType[ListType[...]], since it defaults to None?
+                type=ListType(BaseType(BaseTy.Tensor), size),
+                default="None",
+                default_init=None,
+                outputs=outputs,
+            )
+        raise AssertionError(r"Unexpected PythonOutArgument size")
+
+
+@dataclass(frozen=True)
+class PythonSignature:
+    # Base operator name, without inplace/outplace suffix.
+    name: str
+
+    # Positional arguments.
+    # TODO: create a dedicated SelfArgument type for 'self'?
+    input_args: tuple[PythonArgument, ...]
+
+    # Keyword arguments excluding the 'out' argument and scattered kwargs belonging
+    # to TensorOptions (dtype, layout, device, pin_memory, requires_grad, etc).
+    input_kwargs: tuple[PythonArgument, ...]
+
+    output_args: PythonOutArgument | None
+
+    # Return types, which are only used by pyi
+    returns: PythonReturns
+
+    # These are scattered kwargs arguments belonging to TensorOptions.
+    # When binding to C++, they are packed into a TensorOptions object 'options'.
+    # It's possible that the C++ signature doesn't take TensorOptions object (e.g.
+    # for out variant), in which case they will be used as scattered fields without
+    # being packed into 'options'.
+    # TODO: maybe create a PythonTensorOptionsArgument?
+    tensor_options_args: tuple[PythonArgument, ...]
+
+    # method or function signature?
+    method: bool
+
+    @property
+    def deprecated(self) -> bool:
+        return False
+
+    def arguments(
+        self, *, skip_outputs: bool = False, skip_tensor_options: bool = False
+    ) -> tuple[PythonArgument | PythonOutArgument, ...]:
+        result: list[PythonArgument | PythonOutArgument] = []
+        result.extend(self.input_args)
+        result.extend(self.input_kwargs)
+        if self.output_args is not None and not skip_outputs:
+            result.append(self.output_args)
+        if not skip_tensor_options:
+            result.extend(self.tensor_options_args)
+        return tuple(result)
+
+    def arguments_count(self) -> int:
+        return len(self.arguments())
+
+    def output_idx(self) -> int:
+        return len(self.input_args) + len(self.input_kwargs)
+
+    # [old codegen] Compute the Python function signature for argument parsing,
+    # as specified in torch/csrc/utils/python_arg_parser.h.  WARNING:
+    # this is NOT the same type signature as specified by PEP 484
+    # as understood by mypy; our format was independently developed
+    # and has some quirks to make it more suitable specifically
+    # for error parsing.
+    #
+    # For a translation to mypy-valid type signatures, see
+    # signature_str_pyi().
+    def signature_str(self, *, skip_outputs: bool = False, symint: bool = True) -> str:
+        args = self.arguments(skip_outputs=skip_outputs)
+        schema_formals: list[str] = [
+            a.argument_str(method=self.method, symint=symint) for a in args
+        ]
+        positional_argc = len(self.input_args)
+        if len(schema_formals) > positional_argc:
+            schema_formals.insert(positional_argc, "*")
+
+        return f"{self.name}({', '.join(schema_formals)})"
+
+    def signature_str_pyi(self, *, skip_outputs: bool = False) -> str:
+        args = self.arguments(skip_outputs=skip_outputs)
+        schema_formals: list[str] = [
+            a.argument_str_pyi(method=self.method) for a in args
+        ]
+        positional_argc = len(self.input_args)
+        if len(schema_formals) > positional_argc:
+            schema_formals.insert(positional_argc, "*")
+
+        # only pyi signatures include returns
+        returns_str = returns_str_pyi(self)
+        # pyi also includes self (with no typing/defaults) for methods
+        if self.method:
+            schema_formals.insert(0, "self")
+        return format_function_signature(self.name, schema_formals, returns_str)
+
+    def signature_str_pyi_vararg(self, *, skip_outputs: bool = False) -> str | None:
+        # only pyi uses vararg signatures
+        args = self.arguments(skip_outputs=skip_outputs)
+        schema_formals: list[str] = [
+            a.argument_str_pyi(method=self.method) for a in args
+        ]
+        # vararg only applies to pyi signatures. vararg variants are not generated for all signatures
+        num_args = self.arguments_count()
+        if num_args == 0:
+            return None
+
+        num_positionalargs = len(self.input_args)
+
+        vararg_type = args[0].type
+        if not (
+            isinstance(vararg_type, ListType)
+            and str(vararg_type.elem) in ["int", "SymInt"]
+            and num_positionalargs == 1
+        ):
+            return None
+
+        # Below are the major changes in vararg vs. regular pyi signatures
+        # vararg signatures also omit the asterix
+        assert isinstance(vararg_type, ListType)
+        schema_formals[0] = (
+            "*" + args[0].name + ": " + argument_type_str_pyi(vararg_type.elem)
+        )
+
+        returns_str = returns_str_pyi(self)
+        # pyi also includes self (with no typing/defaults) for methods
+        if self.method:
+            schema_formals.insert(0, "self")
+        return format_function_signature(self.name, schema_formals, returns_str)
+
+
+# The deprecated python signature involves some special logic, so create a
+# dedicated data model to store these extra properties.
+@dataclass(frozen=True)
+class PythonSignatureDeprecated(PythonSignature):
+    # Schema for the deprecated function
+    deprecated_schema: FunctionSchema
+
+    # The deprecated signature might miss some arguments that the corresponding
+    # C++ signature expects. We need store the constant default values to pass in.
+    # For example:
+    #   [deprecate signature]: addmm(Scalar beta, Tensor self, Tensor mat1, Tensor mat2)
+    #   [func schema]: aten::addmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+    #   [func call]: self.addmm(mat1, mat2, beta, 1)
+    # We store ['self', 'mat1', 'mat2', 'beta', '1'] in this case.
+    deprecated_args_exprs: tuple[str, ...]
+
+    @property
+    def deprecated(self) -> bool:
+        return True
+
+    def signature_str(self, *, skip_outputs: bool = False, symint: bool = True) -> str:
+        return (
+            PythonSignature.signature_str(
+                self, skip_outputs=skip_outputs, symint=symint
+            )
+            + "|deprecated"
+        )
+
+    def signature_str_pyi(self, *, skip_outputs: bool = False) -> str:
+        args = self.arguments(skip_outputs=skip_outputs)
+        schema_formals: list[str] = [
+            a.argument_str_pyi(method=self.method, deprecated=True) for a in args
+        ]
+        positional_argc = len(self.input_args)
+        if len(schema_formals) > positional_argc:
+            schema_formals.insert(positional_argc, "*")
+
+        returns_str = returns_str_pyi(self)
+        return format_function_signature(self.name, schema_formals, returns_str)
+
+    def signature_str_pyi_vararg(self, *, skip_outputs: bool = False) -> str | None:
+        # the codegen doesn't include vararg variants for deprecated signatures
+        return None
+
+
+# This struct is used to hold the PythonSignature and its corresponding
+# NativeFunction BEFORE grouping base and out-variant functions.
+# Why not store NativeFunction in PythonSignature or construct PythonSignature
+# from NativeFunction? Because they are not 1-1 mapped.
+# One native function could have both deprecated and non-deprecated python
+# signatures - NativeFunction doesn't contain information to construct the
+# deprecated python signature.
+# One python signature is used to handle both the base and the out-variant
+# function - see 'PythonSignatureGroup'.
+@dataclass(frozen=True)
+class PythonSignatureNativeFunctionPair:
+    signature: PythonSignature
+    function: NativeFunction
+
+
+# We merge pairs of functions with signatures that are equivalent mod
+# output arguments, and use a single entry in the python_arg_parser sig
+# list for both (output arguments become optional).
+@dataclass(frozen=True)
+class PythonSignatureGroup:
+    # The signature used for Python argument parsing. The outplace signature
+    # is preferred if exists, because it can be used to parse inputs for both
+    # the out-place variant and the base version (with output omitted).
+    signature: PythonSignature
+
+    # The regular ATen declaration (e.g. conv2d)
+    base: NativeFunction
+
+    # The out variant (e.g. conv2d_out)
+    outplace: NativeFunction | None
+
+    @classmethod
+    def from_pairs(
+        cls,
+        functional: PythonSignatureNativeFunctionPair,
+        out: PythonSignatureNativeFunctionPair | None,
+    ) -> PythonSignatureGroup:
+        if out is None:
+            return PythonSignatureGroup(
+                signature=functional.signature,
+                base=functional.function,
+                outplace=None,
+            )
+
+        # prefer the signature with optional out=... arguments because it's the
+        # superset that can be used to parse input for both base and outplace.
+        signature_kwargs = out.signature.__dict__.copy()
+
+        # Out overloads in C++ don't have TensorOptions arguments,
+        # so take these from the functional variant
+        signature_kwargs["tensor_options_args"] = (
+            functional.signature.tensor_options_args
+        )
+
+        return PythonSignatureGroup(
+            signature=type(out.signature)(**signature_kwargs),
+            base=functional.function,
+            outplace=out.function,
+        )
+
+
+# C++ function dispatch is wrapped in a lambda function. The lambda function
+# has almost the same signature as the C++ function, only with some small
+# variants - see details below.
+# This data model is used to represent arguments of the lambda function
+# signature.
+@dataclass(frozen=True)
+class DispatchLambdaArgument:
+    name: str
+    type_str: str
+    is_out_arg: bool
+
+
+# To pass PyObjects arguments to C++ function (via the lambda wrapper),
+# we need first convert PyObjects into simple C++ objects. This work
+# is done by PythonArgParser.
+# This data model is used to represent the output of PythonArgParser.
+# It has 1-1 mapping with PythonArgument in PythonSignature.
+@dataclass(frozen=True)
+class PythonArgParserOutputExpr:
+    # argument name
+    name: str
+
+    # RHS expression to reference PythonArgParser output.
+    expr: str
+
+    # In some special cases we need create different expr, e.g.:
+    # '_r.isNone(1)' instead of '_r.tensor(1)'.
+    index: int
+
+    # The python argument it maps to.
+    argument: PythonArgument
+
+    @property
+    def is_none_expr(self) -> str:
+        return f"_r.isNone({self.index})"
+
+
+# To pass PythonArgParser output to the lambda wrapper, we need bind
+# PythonArgParserOutputExpr to DispatchLambdaArgument.
+# They are not always 1-1 mapped, e.g. scattered TensorOptions fields
+# need be packed into a TensorOptions object, which is the argument
+# that the lambda function wrapper takes.
+@dataclass(frozen=True)
+class DispatchLambdaArgumentExprs:
+    # The exprs that provide the binding for lambda arguments, e.g.:
+    #
+    #   'self' -> '_r.tensor(0)'
+    #   'min' -> 'out[0]' / 'min_indices' -> 'out[1]'
+    #   'options' -> 'options'
+    #
+    # It has 1-1 mapping with DispatchLambdaArgument.
+    exprs: Sequence[str]
+
+    # Special local inits, which might introduce new variables that
+    # the 'exprs' above reference, e.g.:
+    #
+    #   'auto out = _r.tensorlist_n<2>(2);'
+    #
+    inits: Sequence[str]
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                          Helper Functions
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+def _cpp_signature(f: NativeFunction, *, method: bool = False) -> CppSignature:
+    return CppSignatureGroup.from_native_function(f, method=method).signature
+
+
+def has_tensor_options(f: NativeFunction) -> bool:
+    return f.func.arguments.tensor_options is not None
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                          Python Signature
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+# 'simple_type' was introduced by the old codegen, which is slightly
+# different from the python schema type, e.g.: doesn't have '?' suffix
+# for optional Tensor/TensorList; doesn't have '[size]' suffix for list type.
+def argument_type_str(
+    t: Type, *, simple_type: bool = False, symint: bool = True
+) -> str:
+    if isinstance(t, BaseType):
+        if t.name == BaseTy.int:
+            return "int64_t"
+        elif t.name == BaseTy.float:
+            return "double"
+        elif t.name == BaseTy.str:
+            return "c10::string_view"
+        elif t.name in [
+            BaseTy.Tensor,
+            BaseTy.bool,
+            BaseTy.QScheme,
+            BaseTy.Scalar,
+            BaseTy.ScalarType,
+            BaseTy.Generator,
+            BaseTy.Storage,
+            BaseTy.Layout,
+            BaseTy.Device,
+            BaseTy.DeviceIndex,
+            BaseTy.MemoryFormat,
+            BaseTy.Dimname,
+            BaseTy.Stream,
+            BaseTy.SymInt,
+        ]:
+            # These python schema type names line up with their function schema names
+            return t.name.name
+
+    elif isinstance(t, OptionalType):
+        elem = argument_type_str(t.elem, simple_type=simple_type, symint=symint)
+        return f"{elem}?"
+    elif isinstance(t, ListType):
+        size = t.size if not simple_type else None
+        if str(t.elem) == "bool":
+            assert t.size is not None
+            return f"::std::array<bool,{t.size}>"
+        elif str(t.elem) == "int":
+            return f"IntArrayRef[{size}]" if size is not None else "IntArrayRef"
+        elif str(t.elem) == "SymInt":
+            if symint:
+                return (
+                    f"SymIntArrayRef[{size}]" if size is not None else "SymIntArrayRef"
+                )
+            else:
+                return f"IntArrayRef[{size}]" if size is not None else "IntArrayRef"
+        elif str(t.elem) == "Tensor":
+            return f"TensorList[{size}]" if size is not None else "TensorList"
+        elif str(t.elem) == "Scalar":
+            return f"ScalarList[{size}]" if size is not None else "ScalarList"
+        elif str(t.elem) == "Tensor?":
+            if simple_type:
+                return "c10::List<::std::optional<Tensor>>"
+            else:
+                return "const c10::List<::std::optional<Tensor>> &"
+        elif str(t.elem) == "Dimname":
+            return f"DimnameList[{size}]" if size is not None else "DimnameList"
+        elem = argument_type_str(t.elem, simple_type=simple_type, symint=symint)
+        return f"ArrayRef<{elem}>"
+
+    raise RuntimeError(f"unrecognized type {repr(t)}")
+
+
+def argument_type_size(t: Type) -> int | None:
+    l = t.is_list_like()
+    if l is not None and str(l.elem) != "bool":
+        return l.size
+    else:
+        return None
+
+
+def argument(a: Argument) -> PythonArgument:
+    return PythonArgument(
+        name=a.name,
+        type=a.type,
+        # TODO: directly translate a.default to python default
+        default=(
+            str(pythonify_default(cpp.default_expr(a.default, a.type, symint=False)))
+            if a.default is not None
+            else None
+        ),
+        default_init=None,
+    )
+
+
+# Generates a PythonSignature that can be used for either .pyi or PythonArgParser codegen
+def signature(
+    f: NativeFunction, *, method: bool = False, pyi: bool = False
+) -> PythonSignature:
+    return signature_from_schema(
+        f.func, category_override=f.category_override, method=method, pyi=pyi
+    )
+
+
+def signature_from_schema(
+    func: FunctionSchema,
+    *,
+    category_override: str | None,
+    method: bool = False,
+    pyi: bool = False,
+) -> PythonSignature:
+    args: list[Argument] = []
+    args.extend(func.arguments.pre_self_positional)
+    # Skip SelfArgument if this is method.
+    if not method and func.arguments.self_arg is not None:
+        args.append(func.arguments.self_arg.argument)
+    args.extend(func.arguments.post_self_positional)
+    args.extend(func.arguments.pre_tensor_options_kwarg_only)
+    # Skip TensorOptionsArguments. Python side TensorOptions
+    # arguments are created based on different rules - see below.
+    args.extend(func.arguments.post_tensor_options_kwarg_only)
+    args.extend(func.arguments.out)
+
+    input_arg_set = {a.name for a in func.arguments.flat_positional}
+    kwarg_only_set = {a.name for a in func.arguments.flat_kwarg_only}
+    out_arg_set = {a.name for a in func.arguments.out}
+
+    input_args = tuple(map(argument, filter(lambda a: a.name in input_arg_set, args)))
+    input_kwargs = tuple(
+        map(argument, filter(lambda a: a.name in kwarg_only_set, args))
+    )
+    outputs = tuple(map(argument, filter(lambda a: a.name in out_arg_set, args)))
+
+    # Reintroduce the scattered fields of TensorOptions for Python.
+    # Compared to the cpp counterpart, the python arguments have new property
+    # (default_init) and a new argument 'requires_grad', which require some
+    # special handlings.
+    # [old codegen] TODO: because these aren't guaranteed to be 100% faithful
+    # to the original versions in the yaml, this recreation is a potential
+    # source of drift between eager and JIT. Pull this logic out to a shared place.
+
+    has_tensor_input_arg = any(
+        a.type.is_tensor_like() for a in func.arguments.flat_non_out
+    )
+    if any(a.name == "requires_grad" for a in func.schema_order_arguments()):
+        raise ValueError(
+            "argument named requires_grad is reserved, should not explicitly add it in the schema"
+        )
+
+    # [old codegen] this probably won't work if one of the returns is not a tensor,
+    # but it will produce a compile-time error that is obvious.
+    has_tensor_return = any(r.type.is_tensor_like() for r in func.returns)
+
+    name: str = cpp.name(func)
+    is_factory_function = category_override == "factory" or (
+        has_tensor_return and not has_tensor_input_arg
+    )
+    is_like_or_new_function = (
+        category_override in ("new", "like")
+        or name.startswith("new_")
+        or name.endswith("_like")
+    )
+    is_dummy_function = category_override == "dummy"
+
+    tensor_options_args: list[PythonArgument] = []
+    if (is_factory_function or is_like_or_new_function) and not is_dummy_function:
+
+        def topt_default_init(name: str) -> str | None:
+            topt_args = func.arguments.tensor_options
+            if topt_args is None:
+                return None
+            a = getattr(topt_args, name)
+            if a.default is None or a.default == "None":
+                return None
+            return cpp.default_expr(a.default, a.type, symint=False)
+
+        tensor_options_args.append(
+            PythonArgument(
+                name="dtype",
+                type=OptionalType(BaseType(BaseTy.ScalarType)),
+                default="None",
+                default_init=(
+                    None if is_like_or_new_function else topt_default_init("dtype")
+                ),
+            )
+        )
+        tensor_options_args.append(
+            PythonArgument(
+                name="layout",
+                type=OptionalType(BaseType(BaseTy.Layout)),
+                default="None",
+                default_init=(
+                    None if is_like_or_new_function else topt_default_init("layout")
+                ),
+            )
+        )
+        tensor_options_args.append(
+            PythonArgument(
+                name="device",
+                type=OptionalType(BaseType(BaseTy.Device)),
+                default="None",
+                default_init=(
+                    None
+                    if is_like_or_new_function
+                    else (
+                        topt_default_init("device")
+                        or "torch::tensors::get_default_device()"
+                    )
+                ),
+            )
+        )
+        tensor_options_args.append(
+            PythonArgument(
+                name="pin_memory",
+                type=OptionalType(BaseType(BaseTy.bool)),
+                default="False",
+                default_init=None,
+            )
+        )
+        tensor_options_args.append(
+            PythonArgument(
+                name="requires_grad",
+                type=OptionalType(BaseType(BaseTy.bool)),
+                default="False",
+                default_init=None,
+            )
+        )
+
+    returns = PythonReturns(returns=func.returns)
+
+    return PythonSignature(
+        name=str(func.name.name),
+        input_args=input_args,
+        input_kwargs=input_kwargs,
+        output_args=PythonOutArgument.from_outputs(outputs),
+        tensor_options_args=tuple(tensor_options_args),
+        returns=returns,
+        method=method,
+    )
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                          Python Interface
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+def structseq_fieldnames(returns: tuple[Return, ...]) -> list[str]:
+    if len(returns) <= 1 or all(r.name is None for r in returns):
+        return []
+    else:
+        if any(r.name is None for r in returns):
+            # When building on Windows, `PyStructSequence_UnnamedField` could not be
+            # resolved by the linker for some reason, which cause error in building:
+            #
+            # python_nn_functions.cpp.obj : error LNK2001: unresolved external symbol
+            # PyStructSequence_UnnamedField
+            #
+            # Thus, at this point in time, we do not support unnamed
+            # fields in structseq; you must either name all fields,
+            # or none of them.
+            raise ValueError("Unnamed field is not supported by codegen")
+
+        return [str(r.name) for r in returns]
+
+
+def argument_type_str_pyi(t: Type) -> str:
+    add_optional = False
+    if isinstance(t, OptionalType):
+        t = t.elem
+        add_optional = True
+
+    ret = ""
+    if isinstance(t, BaseType):
+        if t.name in [BaseTy.int, BaseTy.DeviceIndex]:
+            ret = "_int"
+        if t.name == BaseTy.SymInt:
+            ret = "_int | SymInt"
+        elif t.name == BaseTy.float:
+            ret = "_float"
+        elif t.name == BaseTy.str:
+            ret = "str"
+        elif t.name == BaseTy.Scalar:
+            ret = "Number | _complex"
+        elif t.name == BaseTy.ScalarType:
+            ret = "_dtype"
+        elif t.name == BaseTy.bool:
+            ret = "_bool"
+        elif t.name == BaseTy.QScheme:
+            ret = "_qscheme"
+        elif t.name == BaseTy.Layout:
+            ret = "_layout"
+        elif t.name == BaseTy.Device:
+            ret = "DeviceLikeType | None"
+        elif t.name == BaseTy.MemoryFormat:
+            ret = "memory_format"
+        elif t.name == BaseTy.Dimname:
+            ret = "str | EllipsisType | None"
+        elif t.name == BaseTy.Storage:
+            ret = "Storage | UntypedStorage"
+        elif t.name in [BaseTy.Tensor, BaseTy.Generator, BaseTy.Stream]:
+            # These python schema type names line up with their function schema names
+            ret = t.name.name
+
+    elif isinstance(t, ListType):
+        if str(t.elem) == "int":
+            ret = "_int | _size" if t.size is not None else "_size"
+        elif t.is_tensor_like():
+            # TODO: this doesn't seem right...
+            # Tensor?[] currently translates to tuple[Tensor, ...] | list[Tensor] | None
+            # It should probably translate to   tuple[Tensor | None, ...] | list[Tensor | None]
+            add_optional = True
+            ret = (
+                "Tensor | tuple[Tensor, ...] | list[Tensor]"
+                if t.size is not None
+                else "tuple[Tensor, ...] | list[Tensor]"
+            )
+        elif str(t.elem) == "float":
+            ret = "Sequence[_float]"
+        elif str(t.elem) == "SymInt" and t.size is not None:
+            elem = argument_type_str_pyi(t.elem)
+            ret = f"{elem} | Sequence[{elem}]"
+        else:
+            elem = argument_type_str_pyi(t.elem)
+            ret = f"Sequence[{elem}]"
+
+    else:
+        raise RuntimeError(f"unrecognized type {repr(t)}")
+
+    if add_optional:
+        ret = f"{ret} | None".replace(" | None | None", " | None")
+
+    return ret
+
+
+def return_type_str_pyi(t: Type) -> str:
+    # Where arguments are open to accepting Union, return types should return
+    # concrete types
+
+    if isinstance(t, OptionalType):
+        inner = return_type_str_pyi(t.elem)
+        return f"{inner} | None".replace(" | None | None", " | None")
+
+    if isinstance(t, BaseType):
+        if t.name == BaseTy.Device:
+            return "_device"
+        elif t.name == BaseTy.Dimname:
+            return "str | None"
+        else:
+            return argument_type_str_pyi(t)
+
+    if isinstance(t, ListType):
+        inner = return_type_str_pyi(t.elem)
+        return f"tuple[{inner}, ...]"
+
+    return argument_type_str_pyi(t)
+
+
+def returns_structseq_pyi(signature: PythonSignature) -> tuple[str, str] | None:
+    python_returns = [return_type_str_pyi(r.type) for r in signature.returns.returns]
+    structseq_name = signature.name
+    field_names = structseq_fieldnames(signature.returns.returns)
+    if field_names:
+        # These types are structseq objects which act like named NamedTuples, but
+        # the constructor acts like the constructor of tuple. Using typing.NamedTuple
+        # does not allow us to override __init__.
+        seq_type = f"tuple[{', '.join(python_returns)}]"
+        structseq_def_lines = [
+            f"class {structseq_name}({seq_type}):  # fmt: skip",
+        ]
+        for name, ret_type in zip(field_names, python_returns):
+            structseq_def_lines.extend(
+                [
+                    "    @property",
+                    f"    def {name}(self) -> {ret_type}: ...",
+                ]
+            )
+        structseq_def_lines.extend(
+            [
+                "    def __new__(",
+                "        cls,",
+                f"        sequence: {seq_type},",
+                "    ) -> Self:  # fmt: skip",
+                "        ...",
+                f"    n_fields: Final[_int] = {len(field_names)}",
+                f"    n_sequence_fields: Final[_int] = {len(field_names)}",
+                "    n_unnamed_fields: Final[_int] = 0",
+                "    def __init_subclass__(cls) -> NoReturn: ...  # prohibit subclassing",
+                "",  # add an extra newline
+            ]
+        )
+        structseq_def = "\n".join(structseq_def_lines)
+        # Example:
+        # structseq_def = (
+        #     "class max(tuple[Tensor, Tensor]):  # fmt: skip\n"
+        #     "    @property\n"
+        #     "    def values(self) -> Tensor: ...\n"
+        #     "    @property\n"
+        #     "    def indices(self) -> Tensor: ...\n"
+        #     "    def __new__(\n"
+        #     "        cls,\n"
+        #     "        sequence: tuple[Tensor, Tensor],\n"
+        #     "    ) -> Self:  # fmt: skip\n"
+        #     "        ...\n"
+        #     "    n_fields: Final[_int] = 2",
+        #     "    n_sequence_fields: Final[_int] = 2",
+        #     "    n_unnamed_fields: Final[_int] = 0",
+        #     "    def __init_subclass__(cls) -> NoReturn: ...  # prohibit subclassing",
+        # )
+        return structseq_name, structseq_def
+    return None
+
+
+def returns_str_pyi(signature: PythonSignature) -> str:
+    field_names = structseq_fieldnames(signature.returns.returns)
+    if field_names:
+        return f"torch.return_types.{signature.name}"
+
+    python_returns = [return_type_str_pyi(r.type) for r in signature.returns.returns]
+    if len(python_returns) > 1:
+        return "tuple[" + ", ".join(python_returns) + "]"
+    if len(python_returns) == 1:
+        return python_returns[0]
+    return "None"
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                        C++ Function Dispatch
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+# This section provides APIs to generate the code that does C++ function
+# dispatch. The C++ function call is wrapped by a lambda function.
+# For example:
+#
+#    // aten::selu_(Tensor(a!) self) -> Tensor(a!)
+#    auto dispatch_selu_ = [](Tensor self) -> Tensor {
+#      pybind11::gil_scoped_release no_gil;
+#      return at::selu_(self);
+#    };
+#
+# The lambda function's signature follows the C++ signature in common
+# cases, e.g.:
+#
+#   // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
+#   [](const Tensor & self, const Tensor & other, Scalar alpha) -> Tensor
+#
+# For out variant the 'out' argument's type is changed from 'Tensor &'
+# to 'Tensor'. It's because when calling the lambda it passes in the
+# PythonArgParser output '_r.tensor(3)', which is stack allocated object
+# and needs to pass by value. Also see comments in 'dispatch_lambda_return_str()'.
+#
+#   // aten::add.out(Tensor self, Tensor other, *, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+#   [](Tensor out, const Tensor & self, const Tensor & other, Scalar alpha) -> Tensor
+#
+# For multi-output case it can keep using reference type because the
+# PythonArgParser output has been unpacked to local variables, e.g.:
+#
+#   // aten::max.names_dim_max(Tensor self, Dimname dim, bool keepdim=False, *,
+#   //     Tensor(a!) max, Tensor(b!) max_values) -> (Tensor(a!) values, Tensor(b!) indices)
+#   [](Tensor & max, Tensor & max_values, const Tensor & self, Dimname dim, bool keepdim) -> std::tuple<Tensor,Tensor>
+#
+# For deprecated python signature, it should follow deprecated python arg order.
+# TODO: This is to keep same byte-for-byte result as the old codegen - maybe unnecessary?
+
+
+def dispatch_lambda_args(
+    ps: PythonSignature, f: NativeFunction, symint: bool = True
+) -> tuple[DispatchLambdaArgument, ...]:
+    if isinstance(ps, PythonSignatureDeprecated):
+        schema = ps.deprecated_schema
+    else:
+        schema = f.func
+
+    # Start with cpp arguments - dispatch lambda signature always include 'self'
+    cpp_args = cpp.arguments(
+        arguments=schema.arguments,
+        faithful=False,
+        symint=symint,
+        method=False,
+        cpp_no_default_args=f.cpp_no_default_args,
+    )
+    out_args: set[str] = {a.name for a in schema.arguments.out}
+
+    # Convert from cpp argument to lambda argument
+    def dispatch_lambda_arg(cpp_arg: Binding) -> DispatchLambdaArgument:
+        type_str = cpp_arg.type
+        is_out_arg = cpp_arg.name in out_args
+        if ps.method and cpp_arg.name == "self":
+            # For method's 'self', we can use 'const Tensor &' and simply ignore mutability!
+            type_str = "const at::Tensor &"
+        else:
+            # For other cases we need prevent dangling refs to temps (unless it's
+            # unpacked scattered output)
+            # The reason is explained in the comments above and in 'dispatch_lambda_return_str()'.
+            # TODO: avoid this special handling?
+            ensure_temp_safe = len(out_args) <= 1 or not is_out_arg
+            if ensure_temp_safe:
+                type_str = {
+                    "at::Tensor &": "at::Tensor",
+                }.get(type_str, type_str)
+        return DispatchLambdaArgument(
+            name=cpp_arg.name,
+            type_str=type_str,
+            is_out_arg=is_out_arg,
+        )
+
+    return tuple(map(dispatch_lambda_arg, cpp_args))
+
+
+# [old codegen] XXX: if you got here because of an assertion failure, it doesn't mean
+# it's enough to just extend the list here. Before you do this, make sure
+# to add an appropriate wrap() overload in torch/csrc/autograd/utils/wrap_outputs.h.
+SUPPORTED_RETURN_TYPES = {
+    "at::Tensor",
+    "::std::tuple<at::Tensor,at::Tensor>",
+    "::std::tuple<at::Tensor,at::Tensor,at::Tensor>",
+    "::std::tuple<at::Tensor,at::Tensor,at::Tensor,at::Tensor>",
+    "::std::tuple<at::Tensor,at::Tensor,at::Tensor,at::Tensor,at::Tensor>",
+    "::std::tuple<at::Tensor,at::Tensor,at::Tensor,at::Tensor,at::Tensor,at::Tensor>",
+    "::std::tuple<at::Tensor,at::Tensor,at::Tensor,int64_t>",
+    "::std::tuple<at::Tensor,at::Tensor,double,int64_t>",
+    "::std::tuple<at::Tensor,at::Tensor,at::Tensor,at::Tensor,int64_t>",
+    "::std::tuple<at::Tensor,at::Tensor,double,at::Tensor,int64_t>",
+    "::std::tuple<double,int64_t>",
+    "::std::tuple<at::Tensor,::std::vector<at::Tensor>>",
+    "::std::vector<at::Tensor>",
+    # Needed for flash attention forw/backward
+    "::std::tuple<at::Tensor,at::Tensor,at::Tensor,at::Tensor,c10::SymInt,c10::SymInt,at::Tensor,at::Tensor,at::Tensor>",
+    "at::Scalar",
+    "bool",
+    "int64_t",
+    "void*",
+    "void",
+    "at::QScheme",
+    "double",
+    "at::IntArrayRef",
+    "at::ScalarType",
+    "at::Stream",
+}
+
+
+def dispatch_lambda_return_str(f: NativeFunction) -> str:
+    # [old codegen] Remove type annotation (e.g. 'Tensor' rather than 'Tensor &')
+    # because the dispatch lambdas take mutable arguments *by value*, not
+    # by reference. If you then return a reference to such an argument, you
+    # will now have a pointer to a dangling stack entry. Not good.
+    #
+    # You want:
+    #
+    #   auto dispatch_selu_ = [](Tensor self) -> Tensor { ...; return at::selu_(self); };
+    #                                            ^^^^^^
+    #
+    # *not*
+    #
+    #   auto dispatch_selu_ = [](Tensor self) -> Tensor& { ...; return at::selu_(self); };
+    #                                            ^^^^^^^
+    #
+    # (NB: We can't make dispatch_selu_ take Tensor&, because the enclosing
+    # codegen looks like dispatch_selu_(_r.tensor(0)), and you can't take a
+    # mutable reference to temporary.  Maybe we could assign it to a
+    # variable itself.)
+    returns_without_annotation = tuple(
+        Return(r.name, r.type, None) for r in f.func.returns
+    )
+    return_str = cpp.returns_type(returns_without_annotation, symint=True).cpp_type()
+    if return_str not in SUPPORTED_RETURN_TYPES:
+        raise RuntimeError(f"{f.func.name} returns unsupported type {return_str}")
+    return return_str
+
+
+def cpp_dispatch_target(f: NativeFunction) -> str:
+    symint = f.func.has_symint()
+    name = cpp.name(f.func, symint_overload=symint)
+    if Variant.method in f.variants:
+        return f"self.{name}"
+    if Variant.function in f.variants:
+        if has_tensor_options(f) or f.func.name.name.base.endswith("_like"):
+            namespace = "torch"
+        else:
+            namespace = "at"
+        return f"{namespace}::{name}"
+    raise RuntimeError(f"could not dispatch, neither function nor method: {f.func}")
+
+
+def cpp_dispatch_exprs(
+    f: NativeFunction,
+    *,
+    python_signature: PythonSignature | None = None,
+) -> tuple[str, ...]:
+    cpp_args: Sequence[Binding] = _cpp_signature(f, method=False).arguments()
+
+    exprs: tuple[str, ...] = ()
+    if not isinstance(python_signature, PythonSignatureDeprecated):
+        # By default the exprs are consistent with the C++ signature.
+        exprs = tuple(a.name for a in cpp_args)
+    else:
+        # For deprecated python signature we may need fill in some constants.
+        exprs = tuple(
+            filter(
+                lambda n: n != "out" or f.func.is_out_fn(),
+                python_signature.deprecated_args_exprs,
+            )
+        )
+
+    if Variant.method in f.variants:
+        exprs = tuple(filter("self".__ne__, exprs))
+
+    return exprs
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                     Python / C++ Args Binding
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+# We explicitly enumerate the PythonArgParser unpacking methods for all
+# supported types. This might be more verbose than necessary, partially
+# because of the irregularity of unpacking method naming, partially
+# because we want to mimic the old codegen behavior - to reject
+# unexpected and/or unsupported cases which the old codegen rejects.
+# For certain cases it is intentionally more restrictive than necessary,
+# e.g.: it doesn't accepts doublelist with definite size.
+def arg_parser_unpack_method(
+    t: Type, default: str | None, default_init: str | None, *, symint: bool = True
+) -> str:
+    has_default_init = default_init is not None
+    if has_default_init and str(t) not in (
+        "ScalarType?",
+        "ScalarType",
+        "Device",
+        "Device?",
+        "Layout",
+        "Layout?",
+        "bool",
+        "bool?",
+    ):
+        raise RuntimeError(f"type '{t}' does not supported unpacking with default")
+
+    if isinstance(t, BaseType):
+        if t.name in [
+            BaseTy.Tensor,
+            BaseTy.Stream,
+            BaseTy.Storage,
+            BaseTy.Scalar,
+            BaseTy.Dimname,
+        ]:
+            # These unpack methods line up with their schema names
+            return t.name.name.lower()
+        elif t.name == BaseTy.ScalarType:
+            return "scalartypeWithDefault" if has_default_init else "scalartype"
+        elif t.name == BaseTy.Device:
+            return "deviceWithDefault" if has_default_init else "device"
+        elif t.name == BaseTy.DeviceIndex:
+            return "toInt64"
+        elif t.name == BaseTy.int:
+            return "toInt64"
+        elif t.name == BaseTy.SymInt:
+            return "toSymInt" if symint else "toInt64"
+        elif t.name == BaseTy.bool:
+            return "toBoolWithDefault" if has_default_init else "toBool"
+        elif t.name == BaseTy.float:
+            return "toDouble"
+        elif t.name == BaseTy.str:
+            return "stringView"
+        elif t.name == BaseTy.Layout:
+            return "layoutWithDefault" if has_default_init else "layout"
+        elif t.name == BaseTy.MemoryFormat:
+            return "memoryformat"
+
+    elif isinstance(t, OptionalType):
+        if str(t.elem) == "Tensor":
+            return "optionalTensor"
+        elif str(t.elem) == "Generator":
+            return "generator"
+        elif str(t.elem) == "Dimname[]":
+            return "toDimnameListOptional"
+        elif not has_default_init and default in (
+            None,
+            "None",
+            "::std::nullopt",
+            "std::nullopt",
+        ):
+            # If default is None: append 'Optional' to elem's unpacking method
+            return (
+                arg_parser_unpack_method(t.elem, None, None, symint=symint) + "Optional"
+            )
+        else:
+            # Otherwise, load as underlying type with default
+            return arg_parser_unpack_method(
+                t.elem, default, default_init, symint=symint
+            )
+
+    elif isinstance(t, ListType):
+        if str(t.elem) == "Tensor":
+            # accept and use definite size
+            return f"tensorlist_n<{t.size}>" if t.size is not None else "tensorlist"
+        elif str(t.elem) == "Tensor?":
+            return "list_of_optional_tensors"
+        elif str(t.elem) == "Dimname":
+            # accept definite size
+            return "dimnamelist"
+        elif str(t.elem) == "int":
+            # accept definite size
+            return "intlist"
+        elif str(t.elem) == "float":
+            return "doublelist"
+        elif str(t.elem) == "SymInt":
+            # accept definite size
+            return "symintlist" if symint else "intlist"
+        elif str(t.elem) == "Scalar":
+            return "scalarlist"
+    raise RuntimeError(f"type '{t}' is not supported by PythonArgParser")
+
+
+# Return RHS expression for python argument using PythonArgParser output.
+# e.g. for arg name 'foo', arg type 'bool', arg_index = 2, returns '_r.toBool(2)'
+def arg_parser_output_expr(
+    arg_index: int, a: PythonArgument, *, symint: bool = True
+) -> PythonArgParserOutputExpr:
+    has_default = a.default_init is not None
+    unpack_method = arg_parser_unpack_method(
+        t=a.type, default=a.default, default_init=a.default_init, symint=symint
+    )
+    default = f", {a.default_init}" if has_default else ""
+    expr = f"_r.{unpack_method}({arg_index}{default})"
+
+    return PythonArgParserOutputExpr(
+        name=a.name,
+        expr=expr,
+        index=arg_index,
+        argument=a,
+    )
+
+
+# Returns a map with key = arg_name and value = PythonArgParserOutputExpr.
+def arg_parser_output_exprs(
+    ps: PythonSignature, f: NativeFunction, *, symint: bool = True
+) -> dict[str, PythonArgParserOutputExpr]:
+    return {
+        e.name: e
+        for i, a in enumerate(ps.arguments())
+        for e in (arg_parser_output_expr(i, a, symint=symint),)
+    }
+
+
+# argument name to type for scattered tensor options fields
+TENSOR_OPTIONS_FIELDS = {
+    "dtype": "ScalarType?",
+    "device": "Device?",
+    "layout": "Layout?",
+    "pin_memory": "bool?",
+    "requires_grad": "bool?",
+}
+
+
+# bind arg parser outputs (python args) with dispatch lambda arguments (c++ args).
+def dispatch_lambda_exprs(
+    ps: PythonSignature, f: NativeFunction, *, symint: bool = True
+) -> DispatchLambdaArgumentExprs:
+    # This method is to bind 'arg_parser_outputs' and 'lambda_args' by producing
+    # 'inits' and 'lambda_args_exprs' for each lambda argument using arg parser
+    # outputs.
+    arg_parser_outputs = arg_parser_output_exprs(ps, f, symint=symint)
+    lambda_args = dispatch_lambda_args(ps, f, symint=symint)
+    inits: list[str] = []
+    lambda_args_exprs: dict[str, str] = {}
+
+    has_toptions = has_tensor_options(f)
+
+    # 1. special inits/unpacking to provide binding exprs for lambda arguments.
+    for a in ps.arguments(skip_tensor_options=True):
+        name = a.name
+        arg_parser_expr = arg_parser_outputs[a.name].expr
+
+        if has_toptions and name == "self":
+            # TODO: why this needs to be special case?
+            inits.extend(
+                [
+                    f"auto self = {arg_parser_expr};",
+                ]
+            )
+            lambda_args_exprs[name] = name
+        elif (
+            isinstance(a, PythonOutArgument)
+            and len(a.outputs) > 1
+            and f.func.is_out_fn()
+        ):
+            inits.extend(
+                [
+                    f"auto out = {arg_parser_expr};",
+                ]
+            )
+            for i, out_arg in enumerate(a.outputs):
+                lambda_args_exprs[out_arg.name] = f"out[{i}]"
+        elif str(a.type) == "Dimname[]?":
+            # [old codegen]
+            # TODO: make this part of something more general, or get rid of it.
+            # optional<ArrayRef<T>> are special. The PythonArgParser returns an
+            # optional<vector<T>>, which cannot be implicitly converted to
+            # optional<ArrayRef<T>>. One needs to unwrap the optional and rewrap.
+            inits.extend(
+                [
+                    f"auto __{name} = {arg_parser_expr};",
+                    f"::std::optional<DimnameList> {name} = __{name} ? ::std::make_optional(DimnameList(__{name}.value())) : ::std::nullopt;",  # noqa: B950
+                ]
+            )
+            lambda_args_exprs[name] = name
+        else:
+            # default case - directly using PythonArgParser output expr
+            lambda_args_exprs[name] = arg_parser_expr
+
+    # method's self is passed directly to python binding, rather than parsed
+    if ps.method:
+        lambda_args_exprs["self"] = "self"
+
+    # 2. special packing/checking for TensorOptions.
+    tensor_options_args_names = [a.name for a in ps.tensor_options_args]
+    if has_toptions:
+        if f.func.is_out_fn():
+            raise RuntimeError(f"{f.func}: tensor options with output arg")
+        for a in ps.tensor_options_args:
+            if a.name not in TENSOR_OPTIONS_FIELDS:
+                raise RuntimeError(
+                    f"{f.func}: unrecognized tensor options field '{a.name}' in python binding arguments"
+                )
+            if str(a.type) != TENSOR_OPTIONS_FIELDS.get(a.name):
+                raise RuntimeError(
+                    f"{f.func}: unrecognized type '{str(a.type)}' for tensor options field '{a.name}'"
+                )
+        if not all(a in tensor_options_args_names for a in TENSOR_OPTIONS_FIELDS):
+            raise RuntimeError(
+                f"{f.func}: incomplete tensor options args: {tensor_options_args_names}"
+            )
+
+        inits.append(
+            f"""\
+const auto options = TensorOptions()
+    .dtype({arg_parser_outputs["dtype"].expr})
+    .device({arg_parser_outputs["device"].expr})
+    .layout({arg_parser_outputs["layout"].expr})
+    .requires_grad({arg_parser_outputs["requires_grad"].expr})
+    .pinned_memory({arg_parser_outputs["pin_memory"].expr});
+torch::utils::maybe_initialize_device(options);
+"""
+        )
+        lambda_args_exprs["options"] = "options"
+
+    # 3. special case - access scattered TensorOptions fields without packing
+    # TODO: maybe move to the generator side as it's not related to binding.
+    if not has_toptions and tensor_options_args_names:
+        if "dtype" in tensor_options_args_names:
+            # we're an output-arg variant, check these args against output tensor
+            if not f.func.is_out_fn():
+                raise RuntimeError(
+                    f"{f.func}: dtype in tensor_options_args without output arg, {ps} {ps.arguments}"
+                )
+            if not all(a in tensor_options_args_names for a in ("layout", "device")):
+                raise RuntimeError(
+                    f"{f.func}: incomplete tensor options for output check"
+                )
+
+            inits.append(
+                f"""\
+check_out_type_matches({arg_parser_outputs["out"].expr}, {arg_parser_outputs["dtype"].expr},
+                       {arg_parser_outputs["dtype"].is_none_expr}, {arg_parser_outputs["layout"].expr},
+                       {arg_parser_outputs["device"].expr}, {arg_parser_outputs["device"].is_none_expr});
+"""
+            )
+        # we'll set requires_grad on outgoing tensor
+        if "requires_grad" not in tensor_options_args_names:
+            raise RuntimeError(
+                f'{f.func}: expected "requires_grad" in tensor_options_args absent, but found [{tensor_options_args_names}]'
+            )
+
+    return DispatchLambdaArgumentExprs(
+        exprs=tuple(lambda_args_exprs[a.name] for a in lambda_args),
+        inits=inits,
+    )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/structured.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/structured.py
new file mode 100644
index 0000000000000000000000000000000000000000..a0e14e5b69e6421fce5ddd247958876061d72b2c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/structured.py
@@ -0,0 +1,158 @@
+from __future__ import annotations
+
+from typing_extensions import assert_never
+
+from torchgen.api import cpp
+from torchgen.api.types import (
+    ArgName,
+    ArrayRefCType,
+    BaseCType,
+    Binding,
+    ConstRefCType,
+    dimnameListT,
+    intArrayRefT,
+    iOptTensorListRefT,
+    iTensorListRefT,
+    NamedCType,
+    OptionalCType,
+    optionalIntArrayRefT,
+    optionalScalarRefT,
+    optionalTensorRefT,
+    scalarT,
+    tensorT,
+)
+from torchgen.model import (
+    Argument,
+    BaseTy,
+    BaseType,
+    ListType,
+    NativeFunctionsGroup,
+    OptionalType,
+    SelfArgument,
+    TensorOptionsArguments,
+    Type,
+)
+
+
+# This file describes the translation of JIT schema to the structured functions API.
+# This is similar to native API, but a number of historical problems with native
+# API have been fixed.
+
+
+# Translation of types occurring in JIT arguments to a C++ argument type.
+# NB: For now, mutable doesn't do anything; but it could if we make
+# some more nominal types
+def argumenttype_type(t: Type, *, mutable: bool, binds: ArgName) -> NamedCType:
+    # If it's a value type, do the value type translation
+    # NB: structured kernels ALWAYS have symint off, since they involve actual
+    # kernels that require real ints.  The one exception is the
+    # CompositeExplicitAutograd and the meta function (which could
+    # hypothetically be SymInt), but for simplicity we plan for these to just
+    # be handled in Python
+    r = cpp.valuetype_type(t, symint=False, binds=binds, mutable=mutable)
+    if r is not None:
+        return r
+
+    if isinstance(t, BaseType):
+        if t.name == BaseTy.Tensor:
+            return NamedCType(binds, ConstRefCType(BaseCType(tensorT)))
+        elif t.name == BaseTy.Scalar:
+            return NamedCType(binds, ConstRefCType(BaseCType(scalarT)))
+        else:
+            raise AssertionError(f"base type should have been value type {t}")
+    elif isinstance(t, OptionalType):
+        if t.elem == BaseType(BaseTy.Tensor):
+            return NamedCType(binds, BaseCType(optionalTensorRefT))
+        elif t.elem == BaseType(BaseTy.Scalar):
+            return NamedCType(binds, BaseCType(optionalScalarRefT))
+        elif isinstance(t.elem, ListType) and str(t.elem.elem) == "int":
+            return NamedCType(binds, BaseCType(optionalIntArrayRefT))
+        elem = argumenttype_type(t.elem, mutable=mutable, binds=binds)
+        return NamedCType(binds, OptionalCType(elem.type))
+    elif isinstance(t, ListType):
+        if t.elem == BaseType(BaseTy.Tensor):
+            return NamedCType(binds, ConstRefCType(BaseCType(iTensorListRefT)))
+        elif t.elem == OptionalType(BaseType(BaseTy.Tensor)):
+            return NamedCType(binds, BaseCType(iOptTensorListRefT))
+        # TODO: delete these special cases; see torchgen.api.cpp--these
+        # must be changed in tandem, but there are problems; see
+        # https://github.com/pytorch/pytorch/pull/51485
+        elif str(t.elem) == "int":
+            return NamedCType(binds, BaseCType(intArrayRefT))
+        elif str(t.elem) == "Dimname":
+            return NamedCType(binds, BaseCType(dimnameListT))
+        elem = argumenttype_type(t.elem, mutable=mutable, binds=binds)
+        return NamedCType(binds, ArrayRefCType(elem.type))
+    else:
+        raise AssertionError(f"unrecognized type {repr(t)}")
+
+
+def argument_type(a: Argument, *, binds: ArgName) -> NamedCType:
+    return argumenttype_type(a.type, mutable=a.is_write, binds=binds)
+
+
+# returns_type intentionally omitted, because structured kernels never "return";
+# instead, they always indirectly report their outputs (in the case of a meta
+# function, by calling set_output; in the case of an impl function, by writing
+# directly into the provided out argument).
+
+
+# Structured kernels are never defaulted
+def argument(a: Argument | SelfArgument | TensorOptionsArguments) -> list[Binding]:
+    if isinstance(a, Argument):
+        return [
+            Binding(
+                nctype=argument_type(a, binds=a.name),
+                name=a.name,
+                default=None,
+                argument=a,
+            )
+        ]
+    elif isinstance(a, SelfArgument):
+        return argument(a.argument)
+    elif isinstance(a, TensorOptionsArguments):
+        raise AssertionError("structured kernels don't support TensorOptions yet")
+    else:
+        assert_never(a)
+
+
+def impl_arguments(g: NativeFunctionsGroup) -> list[Binding]:
+    args: list[Argument | TensorOptionsArguments | SelfArgument] = []
+
+    if g.out.precomputed:
+        # A list of parameters for the impl function with
+        # certain parameters replaced with precomputed counterparts
+        # as specified in native_functions.yaml.
+        non_out_args_replaced: list[
+            Argument | TensorOptionsArguments | SelfArgument
+        ] = []
+        for a in g.out.func.arguments.non_out:
+            if isinstance(a, Argument) and a.name in g.out.precomputed.replace:
+                # If a is in precompute.replace, append the parameters
+                # that should replace it onto non_out_args_replaced.
+                non_out_args_replaced.extend(g.out.precomputed.replace[a.name])
+            else:
+                # If not, push a as it is.
+                non_out_args_replaced.append(a)
+
+        args.extend(non_out_args_replaced)
+        # g.out.precomputed.add is the list of parameters that are added
+        # without replacement after the non out args and just before the out args
+        args.extend(g.out.precomputed.add)
+    else:
+        args.extend(g.out.func.arguments.non_out)
+
+    args.extend(g.out.func.arguments.out)
+    return [r for arg in args for r in argument(arg)]
+
+
+def meta_arguments(g: NativeFunctionsGroup) -> list[Binding]:
+    args: list[Argument | TensorOptionsArguments | SelfArgument] = []
+    args.extend(g.functional.func.arguments.non_out)
+    return [r for arg in args for r in argument(arg)]
+
+
+def out_arguments(g: NativeFunctionsGroup) -> list[Binding]:
+    args: list[Argument | TensorOptionsArguments | SelfArgument] = []
+    args.extend(g.out.func.arguments.out)
+    return [r for arg in args for r in argument(arg)]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/translate.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/translate.py
new file mode 100644
index 0000000000000000000000000000000000000000..f98ce09bbfafb875a619ea01eae7b6f82d76ef71
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/translate.py
@@ -0,0 +1,437 @@
+from __future__ import annotations
+
+from typing import NoReturn, TYPE_CHECKING
+
+from torchgen.api.types import (
+    ArrayRefCType,
+    BaseCType,
+    Binding,
+    boolT,
+    ConstRefCType,
+    deviceT,
+    Expr,
+    intArrayRefT,
+    iOptTensorListRefT,
+    layoutT,
+    ListCType,
+    longT,
+    memoryFormatT,
+    MutRefCType,
+    NamedCType,
+    opmath_t,
+    OptionalCType,
+    optionalIntArrayRefT,
+    optionalScalarRefT,
+    optionalSymIntArrayRefT,
+    optionalTensorRefT,
+    scalar_t,
+    scalarT,
+    scalarTypeT,
+    SpecialArgName,
+    symIntArrayRefT,
+    SymIntT,
+    tensorOptionsT,
+    tensorT,
+    VectorCType,
+)
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+
+# This file implements a small program synthesis engine that implements
+# conversions between one API to another.
+#
+# The key data type in this file in NamedCType, short for Named C++ semantic type.  A NamedCType
+# represents a C++ type, plus semantic information about what it represents.
+# For example, consider the argument "bool pin_memory"; its normal C++ type is
+# "bool", but its C++ semantic type also keeps track that this represents a
+# "pin_memory"; you can't just use a random other boolean in a context where you
+# need a "pin_memory"!
+#
+# The translator takes a list of needed NamedCTypes, and then figures out how
+# to construct expressions with these NamedCTypes from the given bindings.  Many
+# of these expressions are trivial (I need a Tensor other; there's a Tensor
+# other scope); others are more nontrivial and may require packing/unpacking.
+# Some examples of non-trivial action:
+#
+#   - Need the "dtype" binding?  Well, maybe "dtype" isn't available
+#     in the context, instead, "options" is, and you need to extract
+#     it from there.  (Gather)
+#
+#   - Need the "context" binding?  Well, maybe "context" isn't available
+#     in the context, and you need to construct it from "dtype", "device",
+#     etc.  (Scatter)
+#
+#   - Need the "memory_format" binding?  Well, actually, it's available
+#     from both "memory_format" and "options", so you had better make sure
+#     they are consistent.  (Join)
+
+options_ctype = NamedCType("options", ConstRefCType(BaseCType(tensorOptionsT)))
+
+out_tensor_ctype = NamedCType("out", ConstRefCType(BaseCType(tensorT)))
+
+longVec_ctype = VectorCType(BaseCType(longT))
+longSymVec_ctype = VectorCType(BaseCType(SymIntT))
+optionalLongVec_ctype = OptionalCType(VectorCType(BaseCType(longT)))
+optionalScalar_ctype = OptionalCType(BaseCType(scalarT))
+optionalTensor_ctype = OptionalCType(BaseCType(tensorT))
+
+
+class UnsatError(RuntimeError):
+    pass
+
+
+# Given a set of in-scope bindings and a set of target bindings, synthesize
+# a list of expressions that uses only the in-scope bindings (bindings) that
+# have all of the types of goals.  You may want to use this function if
+# you're generating code for a function like:
+#
+#   void f({args}) {
+#     g({exprs}); // g is a different API
+#   }
+#
+# and you need to generate "exprs".
+#
+# Typically, a list of Bindings is convenient to get (you usually call something
+# like arguments() to get them); but technically you only need less information:
+# for 'bindings' an (un-ordered) list of Exprs is sufficient; similarly, for
+# 'goals', an (ordered) list of NamedCType goals is sufficient.  If you are doing
+# something more complicated, e.g., tracking the set of bindings in a context,
+# you may find using these smaller types more convenient.
+def translate(
+    bindings: Sequence[Expr | Binding],
+    goals: Sequence[NamedCType | Binding],
+    *,
+    method: bool = False,
+    allow_expensive_conversions: bool = False,
+) -> list[Expr]:
+    binding_exprs: list[Expr] = []
+    for b in bindings:
+        if isinstance(b, Binding):
+            binding_exprs.append(
+                Expr(
+                    expr=b.name,
+                    type=b.nctype,
+                )
+            )
+        else:
+            binding_exprs.append(b)
+
+    goal_ctypes: list[NamedCType] = []
+    for g in goals:
+        if isinstance(g, Binding):
+            goal_ctypes.append(g.nctype)
+        else:
+            goal_ctypes.append(g)
+
+    # Add all the bindings to the context
+    ctx: dict[NamedCType, str] = {}
+    for b in binding_exprs:
+        ctx[b.type] = b.expr
+
+        # While we're at it, do some simple forward inference, looking through
+        # constructors.
+        #
+        # NB: When should you do forward inference versus backward inference?
+        # The general idea:
+        #
+        #   - Backward inference WHEN the goal gets smaller
+        #   - Forward inference WHEN the hypothesis gets smaller
+        #
+        # This helps ensure termination: backward inference starts with a goal
+        # and tries to make it simpler and simpler until it's trivial; if the
+        # goal can grow in size, we blow up to a really huge goal size.
+        # Similarly, with forward inference we take hypotheses and decompose
+        # them into simpler hypotheses; if hypotheses could expand in size,
+        # we also have potential nontermination.  (In the code below, forward
+        # inference is only ever carried out at a single step, but you could
+        # imagine repeated application of forward inference being profitable.)
+        #
+        # A good starting point in the literature for exploring more about proof
+        # search are these lecture notes
+        # https://www.cs.cmu.edu/~fp/courses/oregon-m10/04-focusing.pdf
+        #
+        # TODO: My kingdom for a pattern matcher
+        # https://www.python.org/dev/peps/pep-0634/
+        #
+        # TODO: This could get us in recomputation trouble if b.expr is nontrivial.
+        # Fix this by implementing some sort of sharing so that if multiple
+        # goals share the same expression, we only compute it once.  This seems
+        # to matter in practice as compiler is often unwilling to CSE nontrivial
+        # expressions like scalar.to<scalar_t>()
+        t = b.type
+        if (
+            isinstance(t, ConstRefCType)
+            and isinstance(t.elem, OptionalCType)
+            and isinstance(t.elem.elem, BaseCType)
+            and str(t.elem.elem.type) == "at::Tensor"
+        ):
+            ctx[NamedCType(t.elem.elem.name, ConstRefCType(BaseCType(tensorT)))] = (
+                f"({b.expr}.has_value() ? *{b.expr} : at::Tensor())"
+            )
+
+        if t.type == ConstRefCType(OptionalCType(BaseCType(tensorT))):
+            ctx[NamedCType(t.name, BaseCType(optionalTensorRefT))] = (
+                f"(({b.expr}.has_value() && (*{b.expr}).defined()) ? at::OptionalTensorRef(*{b.expr}) : at::OptionalTensorRef())"
+            )
+
+        if t.type == ConstRefCType(BaseCType(scalarT)):
+            ctx[NamedCType(t.name, BaseCType(opmath_t))] = f"({b.expr}).to<opmath_t>()"
+
+        if t.type == ConstRefCType(OptionalCType(BaseCType(scalarT))):
+            ctx[NamedCType(t.name, BaseCType(optionalScalarRefT))] = (
+                f"({b.expr}.has_value() ? at::OptionalScalarRef(&({b.expr}.value())) : at::OptionalScalarRef())"
+            )
+
+        if t.type == BaseCType(scalar_t):
+            ctx[NamedCType(t.name, BaseCType(opmath_t))] = (
+                f"static_cast<opmath_t>({b.expr})"
+            )
+
+        # [Note: IOptTensorListRef]
+        if t.type == ConstRefCType(ListCType(OptionalCType(BaseCType(tensorT)))):
+            ctx[NamedCType(t.name, BaseCType(iOptTensorListRefT))] = (
+                f"at::IOptTensorListRef({b.expr})"
+            )
+
+    # Add implicit bindings if the generated code is inside a Tensor method
+    if method:
+        ctx[NamedCType("self", MutRefCType(BaseCType(tensorT)))] = (
+            "const_cast<Tensor&>(*this)"
+        )
+        ctx[NamedCType("self", ConstRefCType(BaseCType(tensorT)))] = (
+            "const_cast<Tensor&>(*this)"
+        )
+        # This is better!  Byte-for-byte compat
+        # ctx[NamedCType("self", ConstRefCType(BaseCType(tensorT)))] = "*this"
+
+    def unsat(goal: NamedCType) -> NoReturn:
+        ctx_desc = "\n".join(
+            f"  {t.cpp_type()} {t.name}; // {e}" for t, e in ctx.items()
+        )
+        raise UnsatError(
+            f"""
+Failed to synthesize the expression "{goal.cpp_type()} {goal.name}".
+When I failed, the following bindings were available in the context:
+
+{ctx_desc}
+
+This probably means there is a missing rule in the rules of torchgen.api.translate.
+Check this module for more information.
+"""
+        )
+
+    # A shitty backtracking search implementation.  It's shitty because it
+    # does backtracking via stack (bad idea!) and for the most part tries to
+    # avoid backtracking.  In particular, if
+    # direct=True, we won't try to do any fancy synthesis, just trivial
+    # conversions (e.g., "T a" is OK for "const T& a").  So all of the
+    # existing rules in this function simply try to solve immediately,
+    # and bail if things don't work out.
+    def solve(goal: NamedCType, *, direct: bool) -> str:
+        def direct_solve(goal: NamedCType) -> str:
+            return solve(goal, direct=True)
+
+        if goal in ctx:
+            # Trivial
+            return ctx[goal]
+
+        # const & is satisfied with mutable &
+        if isinstance(goal.type, ConstRefCType):
+            try:
+                # WARNING: not strictly decreasing; be careful not
+                # to add a direct conversion that goes satisfies
+                # mutable& with const&
+                return solve(
+                    NamedCType(goal.name, MutRefCType(goal.type.elem)), direct=direct
+                )
+            except UnsatError:
+                pass
+
+        # mutable & is satisfied with value
+        if isinstance(goal.type, MutRefCType):
+            try:
+                return solve(NamedCType(goal.name, goal.type.elem), direct=direct)
+            except UnsatError:
+                pass
+
+        # TODO: These are referentially equal, shouldn't have to do this;
+        # ensuring we don't use type synonym IntArrayRef in codegen would
+        # help
+        if goal.type == ArrayRefCType(BaseCType(longT)):
+            return solve(NamedCType(goal.name, BaseCType(intArrayRefT)), direct=direct)
+
+        if direct:
+            unsat(goal)
+
+        # For now, all of these rules are mutually exclusive.
+        if goal == NamedCType("memory_format", OptionalCType(BaseCType(memoryFormatT))):
+            memory_format = direct_solve(
+                NamedCType(
+                    SpecialArgName.possibly_redundant_memory_format,
+                    OptionalCType(BaseCType(memoryFormatT)),
+                )
+            )
+            # No need to join "memory_format" and "options" if the target API takes "options" directly.
+            # Otherwise it will cause the redundant memory_format error.
+            if options_ctype in goal_ctypes:
+                return memory_format
+            try:
+                options = direct_solve(options_ctype)
+                return f"c10::impl::check_tensor_options_and_extract_memory_format({options}, {memory_format})"
+            except UnsatError:
+                return memory_format
+        elif goal == NamedCType("options", BaseCType(tensorOptionsT)):
+            dtype = direct_solve(
+                NamedCType("dtype", OptionalCType(BaseCType(scalarTypeT)))
+            )
+            pin_memory = direct_solve(
+                NamedCType("pin_memory", OptionalCType(BaseCType(boolT)))
+            )
+            device = direct_solve(
+                NamedCType("device", OptionalCType(BaseCType(deviceT)))
+            )
+            layout = direct_solve(
+                NamedCType("layout", OptionalCType(BaseCType(layoutT)))
+            )
+            return f"TensorOptions().dtype({dtype}).layout({layout}).device({device}).pinned_memory({pin_memory})"
+
+        elif goal == NamedCType("dtype", OptionalCType(BaseCType(scalarTypeT))):
+            try:
+                options = direct_solve(options_ctype)
+                return f"c10::optTypeMetaToScalarType({options}.dtype_opt())"
+            except UnsatError:
+                out_tensor = direct_solve(out_tensor_ctype)
+                return f"{out_tensor}.scalar_type()"
+
+        elif goal == NamedCType("layout", OptionalCType(BaseCType(layoutT))):
+            try:
+                options = direct_solve(options_ctype)
+                return f"{options}.layout_opt()"
+            except UnsatError:
+                out_tensor = direct_solve(out_tensor_ctype)
+                return f"{out_tensor}.layout()"
+
+        elif goal == NamedCType("device", OptionalCType(BaseCType(deviceT))):
+            try:
+                options = direct_solve(options_ctype)
+                return f"{options}.device_opt()"
+            except UnsatError:
+                out_tensor = direct_solve(out_tensor_ctype)
+                return f"{out_tensor}.device()"
+
+        elif goal == NamedCType("pin_memory", OptionalCType(BaseCType(boolT))):
+            try:
+                options = direct_solve(options_ctype)
+                return f"{options}.pinned_memory_opt()"
+            except UnsatError:
+                # If we're calling a factory op from its out= variant,
+                # We don't actually care about the value of pin_memory.
+                out_tensor = direct_solve(out_tensor_ctype)
+                return "::std::nullopt"
+
+        # We can always do translations from value types to reference types, like vector<int> -> IntArrayRef
+        elif goal.type == BaseCType(intArrayRefT):
+            try:
+                return direct_solve(NamedCType(goal.name, longVec_ctype))
+            except UnsatError:
+                # We can also go SymIntArrayRef -> IntArrayRef
+                symIntArrayRef_type = direct_solve(
+                    NamedCType(goal.name, BaseCType(symIntArrayRefT))
+                )
+                return f"C10_AS_INTARRAYREF_SLOW({symIntArrayRef_type})"
+        elif goal.type == BaseCType(symIntArrayRefT):
+            try:
+                r = direct_solve(NamedCType(goal.name, BaseCType(intArrayRefT)))
+                return f"c10::fromIntArrayRefSlow({r})"
+            except UnsatError:
+                return direct_solve(NamedCType(goal.name, longSymVec_ctype))
+        elif goal.type == BaseCType(SymIntT):
+            return direct_solve(NamedCType(goal.name, BaseCType(longT)))
+        elif goal.type == OptionalCType(BaseCType(SymIntT)):
+            argname = direct_solve(
+                NamedCType(goal.name, OptionalCType(BaseCType(longT)))
+            )
+            return f"{argname}.has_value() ? ::std::make_optional(c10::SymInt(*{argname})) : ::std::nullopt"
+        elif goal.type == BaseCType(longT):
+            symInt_type = direct_solve(NamedCType(goal.name, BaseCType(SymIntT)))
+            return f"{symInt_type}.guard_int(__FILE__, __LINE__)"
+        elif goal.type == OptionalCType(BaseCType(longT)):
+            argname = direct_solve(
+                NamedCType(goal.name, OptionalCType(BaseCType(SymIntT)))
+            )
+            return f"{argname}.has_value() ? ::std::make_optional({argname}->guard_int(__FILE__, __LINE__)) : ::std::nullopt"
+        elif goal.type == BaseCType(optionalIntArrayRefT):
+            try:
+                return direct_solve(NamedCType(goal.name, optionalLongVec_ctype))
+            except UnsatError:
+                argname = direct_solve(
+                    NamedCType(goal.name, BaseCType(optionalSymIntArrayRefT))
+                )
+                return f"{argname}.has_value() ? ::std::make_optional(C10_AS_INTARRAYREF_SLOW(*{argname})) : ::std::nullopt"
+        elif goal.type == BaseCType(optionalSymIntArrayRefT):
+            # TODO: You might also want to solve this from longSymVec_ctype or
+            # an optional version of it
+            argname = direct_solve(
+                NamedCType(goal.name, BaseCType(optionalIntArrayRefT))
+            )
+            return f"{argname}.has_value() ? ::std::make_optional(c10::fromIntArrayRefSlow(*{argname})) : ::std::nullopt"
+        elif goal.type == BaseCType(optionalScalarRefT):
+            return direct_solve(NamedCType(goal.name, optionalScalar_ctype))
+        elif goal.type == BaseCType(optionalTensorRefT):
+            return direct_solve(NamedCType(goal.name, optionalTensor_ctype))
+
+        # Note [translation from C++ reference to value types]
+        # The below cases are all for when we have an argument with a reference type,
+        # and a corresponding goal with a value type.
+        # These are needed when we populate the inputs to a lambda capture and we need
+        # to guarantee the lifetime of each captured argument.
+        # We guard it with an explicit kwarg because converting to a value type is expensive
+        # (O(n)) to convert from IntArrayRef to vector<int>),
+        # so the caller of translate() should be explicit that they need it.
+        if allow_expensive_conversions:
+            if goal.type == VectorCType(BaseCType(longT)):
+                intArrayRef_ctype = NamedCType(goal.name, BaseCType(intArrayRefT))
+                argname = direct_solve(intArrayRef_ctype)
+                return f"{argname}.vec()"
+            if goal.type == VectorCType(BaseCType(SymIntT)):
+                symIntArrayRef_ctype = NamedCType(goal.name, BaseCType(symIntArrayRefT))
+                argname = direct_solve(symIntArrayRef_ctype)
+                return f"{argname}.vec()"
+            elif goal.type == OptionalCType(VectorCType(BaseCType(longT))):
+                optionalIntArrayRef_ctype = NamedCType(
+                    goal.name, BaseCType(optionalIntArrayRefT)
+                )
+                argname = direct_solve(optionalIntArrayRef_ctype)
+                return f"{argname}.has_value() ? ::std::make_optional({argname}->vec()) : ::std::nullopt"
+            elif goal.type == OptionalCType(BaseCType(scalarT)):
+                optionalScalarRef_ctype = NamedCType(
+                    goal.name, BaseCType(optionalScalarRefT)
+                )
+                argname = direct_solve(optionalScalarRef_ctype)
+                return f"{argname}.has_value() ? ::std::make_optional({argname}) : ::std::nullopt"
+            elif goal.type == OptionalCType(BaseCType(scalarT)):
+                optionalTensorRef_ctype = NamedCType(
+                    goal.name, BaseCType(optionalTensorRefT)
+                )
+                argname = direct_solve(optionalTensorRef_ctype)
+                return f"{argname}.has_value() ? ::std::make_optional({argname}) : ::std::nullopt"
+            # Technically, we also need to handle cases of C++ containers holding reference types.
+            # But there currently aren't any ops that require lambda capture codegen
+            # With arguments like ::std::vector<IntArrayRef>.
+            # If that changes, we'll have to add the translation here.
+
+        # We allow const casting on tensors, since const-correctness is a bit broken for at::Tensor.
+        # We could probably generalize this to non-tensor types too.
+        if goal.type == MutRefCType(BaseCType(tensorT)):
+            const_ref_tensor_ctype = NamedCType(
+                goal.name, ConstRefCType(BaseCType(tensorT))
+            )
+            argname = direct_solve(const_ref_tensor_ctype)
+            return f"const_cast<Tensor&>({argname})"
+
+        unsat(goal)
+
+    return [Expr(solve(g, direct=False), g) for g in goal_ctypes]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..4e98bb8df493f2375b514e6c6aeb897cebe8ec7d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__init__.py
@@ -0,0 +1,5 @@
+from torchgen.api.types.types import *
+from torchgen.api.types.types_base import *
+
+
+from torchgen.api.types.signatures import *  # usort: skip
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..887809c89ed8aabff77134784f8ab1b61ca8d0a6
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/signatures.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/signatures.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..99688f2cfb32f3eb673652409b7871b3290c751b
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/signatures.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/types.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/types.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..84109bbef89e91718fb97dc485a7dedaf6c633b0
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/types.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/types_base.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/types_base.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..69556c46e3e37f2e60b40e5581f9fd289f202613
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/__pycache__/types_base.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/signatures.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/signatures.py
new file mode 100644
index 0000000000000000000000000000000000000000..d4a47536dd1ff213bc8bd8aceee2bd22531088a6
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/signatures.py
@@ -0,0 +1,356 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import TYPE_CHECKING
+
+from torchgen.api.types.types_base import Binding, CType, Expr
+
+
+if TYPE_CHECKING:
+    from collections.abc import Iterator, Sequence
+
+    from torchgen.model import (
+        BackendIndex,
+        FunctionSchema,
+        NativeFunction,
+        NativeFunctionsGroup,
+        NativeFunctionsViewGroup,
+    )
+
+
+@dataclass(frozen=True)
+class CppSignature:
+    """
+    A CppSignature represents a single overload in the C++ API.  For
+    any given function schema, there may be multiple CppSignatures
+    corresponding to it, based on how we desugar to C++.  See also
+    CppSignatureGroup.
+    """
+
+    # The schema this signature is derived from
+    func: FunctionSchema
+
+    # Is this a C++ signature for a method, i.e. Tensor::my_op(...)?
+    method: bool
+
+    # Is this a faithful C++ signature (i.e. following the JIT schema) or a convenience API
+    # (i.e. with a potential TensorOptions argument and out arguments in the front)
+    faithful: bool
+
+    # Is this a symint C++ signature.  For BC reasons, functions that take
+    # SymInts still present as int64_t in C++, and the SymInt variant is
+    # offered at a different overload name
+    #
+    # NB: If a function RETURNS a SymInt, this is ALWAYS false
+    symint: bool
+
+    # The set of C++ arguments which should not have defaults applied to them
+    cpp_no_default_args: set[str]
+
+    # Is this a fallback C++ binding?  Fallback bindings are enabled by
+    # manual_cpp_binding: True and are alternate, non-public API that
+    # lets manual C++ binding implementers access the binding that would
+    # have been automatically generated
+    fallback_binding: bool = False
+
+    # Return the unpacked argument structure of this signature,
+    # discarding information about which arguments are semantically
+    # related to each other.
+    def arguments(self) -> Sequence[Binding]:
+        return cpp.arguments(
+            self.func.arguments,
+            faithful=self.faithful,
+            symint=self.symint,
+            method=self.method,
+            cpp_no_default_args=self.cpp_no_default_args,
+        )
+
+    def name(self, *, suppress_symint_suffix: bool = False) -> str:
+        n = cpp.name(
+            self.func,
+            faithful_name_for_out_overloads=self.faithful,
+            symint_overload=False if suppress_symint_suffix else self.symint,
+        )
+        if self.fallback_binding:
+            n = f"__dispatch_{n}"
+        return n
+
+    # Render the C++ declaration for this signature
+    def decl(
+        self,
+        *,
+        name: str | None = None,
+        prefix: str = "",
+        is_redispatching_fn: bool = False,
+        suppress_symint_suffix: bool = False,
+    ) -> str:
+        returns_type = cpp.returns_type(
+            self.func.returns, symint=self.symint
+        ).cpp_type()
+        cpp_args = [a.decl() for a in self.arguments()]
+        if is_redispatching_fn:
+            cpp_args = ["c10::DispatchKeySet dispatchKeySet"] + cpp_args
+        cpp_args_str = ", ".join(cpp_args)
+        if name is None:
+            name = prefix + self.name(suppress_symint_suffix=suppress_symint_suffix)
+        return f"{returns_type} {name}({cpp_args_str})"
+
+    # Render the C++ definition for this signature, not including
+    # the body (with curly braces)
+    def defn(
+        self,
+        *,
+        name: str | None = None,
+        prefix: str = "",
+        is_redispatching_fn: bool = False,
+    ) -> str:
+        returns_type = cpp.returns_type(
+            self.func.returns, symint=self.symint
+        ).cpp_type()
+        cpp_args = [a.defn() for a in self.arguments()]
+        if is_redispatching_fn:
+            cpp_args = ["c10::DispatchKeySet dispatchKeySet"] + cpp_args
+        cpp_args_str = ", ".join(cpp_args)
+        if name is None:
+            name = prefix + self.name()
+        return f"{returns_type} {name}({cpp_args_str})"
+
+    def ptr_type(self) -> str:
+        args_types_str = ", ".join(a.type for a in self.arguments())
+        return f"{cpp.returns_type(self.func.returns, symint=self.symint).cpp_type()} (*)({args_types_str})"
+
+    # Return the C++ function type, e.g., something like int(bool)
+    def type(self) -> str:
+        args_types_str = ", ".join(a.type for a in self.arguments())
+        return f"{cpp.returns_type(self.func.returns, symint=self.symint).cpp_type()} ({args_types_str})"
+
+
+# Represents group of all CppSignatures associated with a
+# FunctionSchema.  Right now, that's the regular, user-visible
+# signature, as well as a "faithful" signature which doesn't
+# have grouping.
+@dataclass(frozen=True)
+class CppSignatureGroup:
+    func: FunctionSchema
+    signature: CppSignature
+    faithful_signature: CppSignature | None
+    symint_signature: CppSignature | None
+    symint_faithful_signature: CppSignature | None
+
+    def most_faithful_signature(self) -> CppSignature:
+        if self.faithful_signature:
+            return self.faithful_signature
+        else:
+            return self.signature
+
+    def signatures(self, *, symint: bool = True) -> Iterator[CppSignature]:
+        yield self.signature
+        if self.faithful_signature:
+            yield self.faithful_signature
+        if symint:
+            if self.symint_signature:
+                yield self.symint_signature
+            if self.symint_faithful_signature:
+                yield self.symint_faithful_signature
+
+    @staticmethod
+    def from_native_function(
+        f: NativeFunction, *, method: bool, fallback_binding: bool = False
+    ) -> CppSignatureGroup:
+        func = f.func
+
+        def make_sig(*, faithful: bool, symint: bool) -> CppSignature:
+            return CppSignature(
+                func=func,
+                faithful=faithful,
+                symint=symint,
+                method=method,
+                fallback_binding=fallback_binding,
+                cpp_no_default_args=f.cpp_no_default_args,
+            )
+
+        def make_sigs(*, symint: bool) -> tuple[CppSignature, CppSignature | None]:
+            faithful_signature: CppSignature | None = None
+            if func.arguments.tensor_options is not None or len(func.arguments.out) > 0:
+                faithful_signature = make_sig(faithful=True, symint=symint)
+            signature = make_sig(faithful=False, symint=symint)
+            return signature, faithful_signature
+
+        signature, faithful_signature = make_sigs(symint=False)
+        symint_signature: CppSignature | None = None
+        symint_faithful_signature: CppSignature | None = None
+        if func.has_symint():
+            symint_signature, symint_faithful_signature = make_sigs(symint=True)
+
+        return CppSignatureGroup(
+            func=func,
+            signature=signature,
+            faithful_signature=faithful_signature,
+            symint_signature=symint_signature,
+            symint_faithful_signature=symint_faithful_signature,
+        )
+
+
+@dataclass(frozen=True)
+class DispatcherSignature:
+    # The schema this signature is derived from
+    func: FunctionSchema
+
+    # Allows you to prepend an arbitrary prefix to the signature name.
+    # This is useful for parts of the codegen that generate wrappers around kernels,
+    # and need to avoid naming collisions.
+    prefix: str = ""
+
+    symint: bool = True
+
+    def arguments(self) -> list[Binding]:
+        return dispatcher.arguments(self.func, symint=self.symint)
+
+    def name(self) -> str:
+        return self.prefix + dispatcher.name(self.func)
+
+    def decl(self, name: str | None = None) -> str:
+        args_str = ", ".join(a.decl() for a in self.arguments())
+        if name is None:
+            name = self.name()
+        return f"{self.returns_type().cpp_type()} {name}({args_str})"
+
+    def defn(
+        self, name: str | None = None, *, is_redispatching_fn: bool = False
+    ) -> str:
+        args = [a.defn() for a in self.arguments()]
+        if is_redispatching_fn:
+            args = ["c10::DispatchKeySet dispatchKeySet"] + args
+        args_str = ", ".join(args)
+        if name is None:
+            name = self.name()
+        return f"{self.returns_type().cpp_type()} {name}({args_str})"
+
+    def exprs(self) -> list[Expr]:
+        return [Expr(a.name, a.nctype) for a in self.arguments()]
+
+    def returns_type(self) -> CType:
+        return dispatcher.returns_type(self.func.returns, symint=self.symint)
+
+    def ptr_type(self) -> str:
+        dispatcher_args_types_str = ", ".join(a.type for a in self.arguments())
+        return f"{self.returns_type().cpp_type()} (*)({dispatcher_args_types_str})"
+
+    # Return the C++ function type, e.g., something like int(bool)
+    def type(self) -> str:
+        dispatcher_args_types_str = ", ".join(a.type for a in self.arguments())
+        return f"{self.returns_type().cpp_type()} ({dispatcher_args_types_str})"
+
+    @staticmethod
+    def from_schema(
+        func: FunctionSchema, *, prefix: str = "", symint: bool = True
+    ) -> DispatcherSignature:
+        return DispatcherSignature(func, prefix, symint)
+
+
+@dataclass(frozen=True)
+class NativeSignature:
+    # The schema this signature is derived from
+    func: FunctionSchema
+
+    symint: bool
+
+    prefix: str = ""
+
+    def name(self) -> str:
+        return self.prefix + native.name(self.func)
+
+    def decl(self, name: str | None = None) -> str:
+        args_str = ", ".join(a.decl() for a in self.arguments())
+        if name is None:
+            name = self.name()
+        return f"{native.returns_type(self.func.returns, symint=self.symint).cpp_type()} {name}({args_str})"
+
+    def defn(self, name: str | None = None) -> str:
+        args_str = ", ".join(a.defn() for a in self.arguments())
+        if name is None:
+            name = self.name()
+        return f"{native.returns_type(self.func.returns, symint=self.symint).cpp_type()} {name}({args_str})"
+
+    def ptr_type(self) -> str:
+        # don't include defaults in type signature!
+        args_str = ", ".join(a.defn() for a in self.arguments())
+        return f"{native.returns_type(self.func.returns, symint=self.symint).cpp_type()} (*)({args_str})"
+
+    def arguments(self) -> list[Binding]:
+        return native.arguments(self.func, symint=self.symint)
+
+    def returns_type(self) -> CType:
+        return native.returns_type(self.func.returns, symint=self.symint)
+
+    def dispatcher_exprs(self) -> list[Expr]:
+        return translate.translate(
+            self.arguments(), dispatcher.arguments(self.func), method=False
+        )
+
+
+@dataclass(frozen=True)
+class ViewInverseSignature:
+    g: NativeFunctionsViewGroup
+
+    def name(self) -> str:
+        return functionalization.reverse_name(self.g.view, include_namespace=False)
+
+    def decl(self) -> str:
+        return_type = functionalization.returns_type(self.g.view.func)
+        decls = [
+            a.decl()
+            for a in functionalization.op_arguments(self.g.view.func, is_reverse=True)
+        ]
+        return f"static {return_type.cpp_type()} {self.name()}({', '.join(decls)});"
+
+
+@dataclass(frozen=True)
+class StructuredImplSignature:
+    g: NativeFunctionsGroup
+    name: str
+
+    def defn(self, name: str | None = None) -> str:
+        args_str = ", ".join(a.defn() for a in self.arguments())
+        return f"TORCH_IMPL_FUNC({self.name})({args_str})"
+
+    def arguments(self) -> list[Binding]:
+        return structured.impl_arguments(self.g)
+
+
+# Helper functions
+
+
+def kernel_signature(
+    f: NativeFunction, backend_index: BackendIndex, *, prefix: str = ""
+) -> NativeSignature | DispatcherSignature:
+    # Note [External Backends Follow Dispatcher API]
+    # Kernel signatures for in-tree backends follow the "native" API,
+    # while kernels for out-of-tree backends follow the dispatcher API.
+    # See the comments in `native.py` for details, but historically there have been
+    # some small differences in schema convention between them and the Dispatcher API.
+    # Any differences that require translating between the two will results in a runtime cost,
+    # so we'd like to keep the differences as small as possible.
+    # With external backends, we'd like to enforce that they write their kernels with schemas
+    # that match the Dispatcher API directly, if they can.
+    meta = backend_index.get_kernel(f)
+    symint = meta is not None and meta.supports_symint()
+    if symint:
+        assert f.func.has_symint(), (
+            f"attempted to define symint kernel for {backend_index.dispatch_key} without SymInt in schema"
+        )
+    if backend_index.external:
+        return DispatcherSignature.from_schema(f.func, prefix=prefix, symint=symint)
+    else:
+        return NativeSignature(f.func, prefix=prefix, symint=symint)
+
+
+# Functions only, no types
+from torchgen.api import (
+    cpp,
+    dispatcher,
+    functionalization,
+    native,
+    structured,
+    translate,
+)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/types.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/types.py
new file mode 100644
index 0000000000000000000000000000000000000000..41c05653fffdf3d04fc7078e7df142124ed96e00
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/types.py
@@ -0,0 +1,183 @@
+"""
+Where should I add a new type? `types_base.py` vs `types.py`
+
+This file defines data model classes for torchgen typing system, as well as some base types such as int32_t.
+
+`types.py` defines ATen Tensor type and some c10 types, along with signatures that use these types.
+
+The difference between these two files, is `types_base.py` should be implementation-agnostic, meaning it shouldn't
+contain any type definition that is tight to a specific C++ library (e.g., ATen), so that it can be easily reused
+if we want to generate code for another C++ library.
+
+Add new types to `types.py` if these types are ATen/c10 related.
+Add new types to `types_base.py` if they are basic and not attached to ATen/c10.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+from torchgen.api.types.types_base import (
+    BaseCppType,
+    BaseCType,
+    boolT,
+    byteT,
+    charT,
+    CType,
+    doubleT,
+    floatT,
+    int32T,
+    longT,
+    shortT,
+)
+from torchgen.model import BaseTy, ScalarType
+
+
+TENSOR_LIST_LIKE_CTYPES = [
+    "at::TensorList",
+    "const c10::List<::std::optional<at::Tensor>> &",
+    "const at::ITensorListRef &",
+]
+
+
+halfT = BaseCppType("at", "Half")
+complexHalfT = BaseCppType(
+    "c10", "complex<c10::Half>"
+)  # stuffing template param here is an abuse
+complexFloatT = BaseCppType("c10", "complex<float>")
+complexDoubleT = BaseCppType("c10", "complex<double>")
+bfloat16T = BaseCppType("at", "BFloat16")
+float8_e5m2T = BaseCppType("at", "Float8_e5m2")
+float8_e5m2fnuzT = BaseCppType("at", "Float8_e5m2fnuz")
+float8_e4m3fnT = BaseCppType("at", "Float8_e4m3fn")
+float8_e4m3fnuzT = BaseCppType("at", "Float8_e4m3fnuz")
+float8_e8m0fnuT = BaseCppType("at", "Float8_e8m0fnu")
+stringT = BaseCppType("c10", "string_view")
+generatorT = BaseCppType("at", "Generator")
+scalarTypeT = BaseCppType("at", "ScalarType")
+tensorT = BaseCppType("at", "Tensor")
+optionalTensorRefT = BaseCppType("at", "OptionalTensorRef")
+tensorListT = BaseCppType("at", "TensorList")
+iTensorListRefT = BaseCppType("at", "ITensorListRef")
+iOptTensorListRefT = BaseCppType("at", "IOptTensorListRef")
+dimnameT = BaseCppType("at", "Dimname")
+dimnameListT = BaseCppType("at", "DimnameList")
+dimVectorT = BaseCppType("at", "DimVector")
+layoutT = BaseCppType("at", "Layout")
+deviceT = BaseCppType("at", "Device")
+deviceIndexT = BaseCppType("at", "DeviceIndex")
+scalarT = BaseCppType("at", "Scalar")
+optionalScalarRefT = BaseCppType("at", "OptionalScalarRef")
+memoryFormatT = BaseCppType("at", "MemoryFormat")
+qschemeT = BaseCppType("at", "QScheme")
+storageT = BaseCppType("at", "Storage")
+streamT = BaseCppType("at", "Stream")
+intArrayRefT = BaseCppType("at", "IntArrayRef")
+optionalIntArrayRefT = BaseCppType("at", "OptionalIntArrayRef")
+optionalSymIntArrayRefT = BaseCppType("at", "OptionalSymIntArrayRef")
+tensorOptionsT = BaseCppType("at", "TensorOptions")
+typeAndSizeT = BaseCppType("torch::autograd::generated", "TypeAndSize")
+tensorGeometryT = BaseCppType("at", "TensorGeometry")
+SymIntT = BaseCppType("c10", "SymInt")
+SymBoolT = BaseCppType("c10", "SymBool")
+symIntArrayRefT = BaseCppType("c10", "SymIntArrayRef")
+
+# Types representing template parameters.  Technically, we probably shouldn't
+# represent them this way in codegen, but it was pretty convenient.
+scalar_t = BaseCppType("", "scalar_t")
+opmath_t = BaseCppType("", "opmath_t")
+
+ScalarTypeToCppMapping: dict[ScalarType, BaseCppType] = {
+    ScalarType.Byte: byteT,
+    ScalarType.Char: charT,
+    ScalarType.Short: shortT,
+    ScalarType.Int: int32T,
+    ScalarType.Long: longT,
+    ScalarType.Half: halfT,
+    ScalarType.Float: floatT,
+    ScalarType.Double: doubleT,
+    ScalarType.ComplexHalf: complexHalfT,
+    ScalarType.ComplexFloat: complexFloatT,
+    ScalarType.ComplexDouble: complexDoubleT,
+    ScalarType.Bool: boolT,
+    ScalarType.Float8_e5m2: float8_e5m2T,
+    ScalarType.Float8_e5m2fnuz: float8_e5m2fnuzT,
+    ScalarType.Float8_e4m3fn: float8_e4m3fnT,
+    ScalarType.Float8_e4m3fnuz: float8_e4m3fnuzT,
+    ScalarType.Float8_e8m0fnu: float8_e8m0fnuT,
+}
+
+BaseTypeToCppMapping: dict[BaseTy, BaseCppType] = {
+    BaseTy.int: longT,
+    BaseTy.float: doubleT,
+    BaseTy.bool: boolT,
+    BaseTy.str: stringT,
+    BaseTy.Generator: generatorT,
+    BaseTy.ScalarType: scalarTypeT,
+    BaseTy.Tensor: tensorT,
+    BaseTy.Dimname: dimnameT,
+    BaseTy.DimVector: dimVectorT,
+    BaseTy.Layout: layoutT,
+    BaseTy.Device: deviceT,
+    BaseTy.DeviceIndex: deviceIndexT,
+    BaseTy.Scalar: scalarT,
+    BaseTy.MemoryFormat: memoryFormatT,
+    BaseTy.QScheme: qschemeT,
+    BaseTy.Storage: storageT,
+    BaseTy.Stream: streamT,
+    BaseTy.SymInt: SymIntT,
+    BaseTy.SymBool: SymBoolT,
+}
+
+# CTypes encode C++ type structure as needed for translation.
+
+
+@dataclass(frozen=True)
+class OptionalCType(CType):
+    elem: CType
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        # Do not pass `strip_ref` recursively.
+        return f"::std::optional<{self.elem.cpp_type()}>"
+
+    def remove_const_ref(self) -> CType:
+        return OptionalCType(self.elem.remove_const_ref())
+
+
+@dataclass(frozen=True)
+class ListCType(CType):
+    elem: CType
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        # Do not pass `strip_ref` recursively.
+        return f"c10::List<{self.elem.cpp_type()}>"
+
+    def remove_const_ref(self) -> CType:
+        return ListCType(self.elem.remove_const_ref())
+
+
+@dataclass(frozen=True)
+class ArrayRefCType(CType):
+    elem: CType
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        # Do not pass `strip_ref` recursively.
+        return f"at::ArrayRef<{self.elem.cpp_type()}>"
+
+    def remove_const_ref(self) -> CType:
+        return ArrayRefCType(self.elem.remove_const_ref())
+
+
+@dataclass(frozen=True)
+class VectorizedCType(CType):
+    # This template is explicitly specialized, so the only valid
+    # elems are those we have specializations for (e.g., float, double, ...)
+    # scalar_t is also a common argument here (when we are codegen in
+    # a templated context)
+    elem: BaseCType
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        return f"at::vec::Vectorized<{self.elem.cpp_type()}>"
+
+    def remove_const_ref(self) -> CType:
+        return self
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/types_base.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/types_base.py
new file mode 100644
index 0000000000000000000000000000000000000000..08085fa0fa2bf04b3be6d9a9b8c411c9bbfed6d8
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/types/types_base.py
@@ -0,0 +1,238 @@
+"""
+Where should I add a new type? `types_base.py` vs `types.py`
+
+This file defines data model classes for torchgen typing system, as well as some base types such as int32_t.
+
+`types.py` defines ATen Tensor type and some c10 types, along with signatures that use these types.
+
+The difference between these two files, is `types_base.py` should be implementation-agnostic, meaning it shouldn't
+contain any type definition that is tight to a specific C++ library (e.g., ATen), so that it can be easily reused
+if we want to generate code for another C++ library.
+
+Add new types to `types.py` if these types are ATen/c10 related.
+Add new types to `types_base.py` if they are basic and not attached to ATen/c10.
+"""
+
+from __future__ import annotations
+
+from abc import ABC, abstractmethod
+from dataclasses import dataclass
+from enum import auto, Enum
+from typing import TYPE_CHECKING, Union
+
+
+if TYPE_CHECKING:
+    from torchgen.model import Argument, SelfArgument, TensorOptionsArguments
+
+
+# An ArgName is just the str name of the argument in schema;
+# but in some special circumstances, we may add a little extra
+# context.  The Enum SpecialArgName covers all of these cases;
+# grep for their construction sites to see when they can occur.
+
+
+class SpecialArgName(Enum):
+    possibly_redundant_memory_format = auto()
+
+
+ArgName = Union[str, SpecialArgName]
+
+
+# This class shouldn't be created directly; instead, use/create one of the singletons below.
+@dataclass(frozen=True)
+class BaseCppType:
+    ns: str | None
+    name: str
+
+    def __str__(self) -> str:
+        if self.ns is None or self.ns == "":
+            return self.name
+        return f"{self.ns}::{self.name}"
+
+
+# The set of all non-templated, valid, fully-qualified names of C++ types that are used in the codegen.
+# Templated types get their own dataclass, mainly to make namespace parsing easier.
+byteT = BaseCppType("", "uint8_t")
+charT = BaseCppType("", "int8_t")
+shortT = BaseCppType("", "int16_t")
+# It would be more symmetric for this to be called intT, but it easy to mix
+# this up with JIT int (which is int64_t in C++), so we intentionally don't
+# define intT to make it obvious when you've stuffed it up
+int32T = BaseCppType("", "int32_t")
+longT = BaseCppType("", "int64_t")
+doubleT = BaseCppType("", "double")
+floatT = BaseCppType("", "float")
+boolT = BaseCppType("", "bool")
+voidT = BaseCppType("", "void")
+
+
+class CType(ABC):
+    @abstractmethod
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        raise NotImplementedError
+
+    @abstractmethod
+    def remove_const_ref(self) -> CType:
+        return self
+
+
+@dataclass(frozen=True)
+class BaseCType(CType):
+    type: BaseCppType
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        return str(self.type)
+
+    def remove_const_ref(self) -> CType:
+        return self
+
+
+@dataclass(frozen=True)
+class ConstRefCType(CType):
+    elem: CType
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        if strip_ref:
+            return self.elem.cpp_type(strip_ref=strip_ref)
+        return f"const {self.elem.cpp_type()} &"
+
+    def remove_const_ref(self) -> CType:
+        return self.elem.remove_const_ref()
+
+
+@dataclass(frozen=True)
+class VectorCType(CType):
+    elem: CType
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        # Do not pass `strip_ref` recursively.
+        return f"::std::vector<{self.elem.cpp_type()}>"
+
+    def remove_const_ref(self) -> CType:
+        return VectorCType(self.elem.remove_const_ref())
+
+
+@dataclass(frozen=True)
+class ArrayCType(CType):
+    elem: CType
+    size: int
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        # Do not pass `strip_ref` recursively.
+        return f"::std::array<{self.elem.cpp_type()},{self.size}>"
+
+    def remove_const_ref(self) -> CType:
+        return ArrayCType(self.elem.remove_const_ref(), self.size)
+
+
+@dataclass(frozen=True)
+class TupleCType(CType):
+    elems: list[CType]
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        # Do not pass `strip_ref` recursively.
+        return f"::std::tuple<{','.join([e.cpp_type() for e in self.elems])}>"
+
+    def remove_const_ref(self) -> CType:
+        return TupleCType([e.remove_const_ref() for e in self.elems])
+
+
+@dataclass(frozen=True)
+class MutRefCType(CType):
+    elem: CType
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        if strip_ref:
+            return self.elem.cpp_type(strip_ref=strip_ref)
+        return f"{self.elem.cpp_type()} &"
+
+    def remove_const_ref(self) -> CType:
+        return self.elem.remove_const_ref()
+
+
+# A NamedCType is short for Named C++ semantic type.  A NamedCType represents a C++ type, plus
+# semantic information about what it represents.  For example, consider the
+# argument "bool pin_memory"; its normal C++ type is "bool", but its C++
+# semantic type also keeps track that this represents a "pin_memory"; you can't
+# just use a random other boolean in a context where you need a "pin_memory"!
+#
+
+
+@dataclass(frozen=True)
+class NamedCType:
+    name: ArgName
+    type: CType
+
+    def cpp_type(self, *, strip_ref: bool = False) -> str:
+        return self.type.cpp_type(strip_ref=strip_ref)
+
+    def remove_const_ref(self) -> NamedCType:
+        return NamedCType(self.name, self.type.remove_const_ref())
+
+    def with_name(self, name: str) -> NamedCType:
+        return NamedCType(name, self.type)
+
+
+# A binding represents any C++ binding site for a formal parameter.
+# We don't distinguish between binding sites for different APIs;
+# instead, all of the important distinctions are encoded in CType,
+# which you can use to figure out if a given Binding is appropriate
+# for use in another context.  (See torchgen.api.translate)
+
+
+@dataclass(frozen=True)
+class Binding:
+    name: str
+    nctype: NamedCType
+    argument: Argument | TensorOptionsArguments | SelfArgument
+    # TODO: maybe don't represent default here
+    default: str | None = None
+
+    def rename(self, name: str) -> Binding:
+        return Binding(
+            name=name,
+            nctype=self.nctype,
+            argument=self.argument,
+            default=self.default,
+        )
+
+    @property
+    def type(self) -> str:
+        return self.nctype.cpp_type()
+
+    def no_default(self) -> Binding:
+        return Binding(
+            name=self.name,
+            nctype=self.nctype,
+            default=None,
+            argument=self.argument,
+        )
+
+    def decl(self, *, func_ptr_cast: bool = False) -> str:
+        mb_default = ""
+        if self.default is not None:
+            mb_default = f"={self.default}"
+
+        # casting only needs to know the type
+        if func_ptr_cast:
+            return f"{self.type}"
+        else:
+            return f"{self.type} {self.name}{mb_default}"
+
+    def defn(self) -> str:
+        return f"{self.type} {self.name}"
+
+    def with_name(self, name: str) -> Binding:
+        return Binding(
+            name=name, nctype=self.nctype, argument=self.argument, default=self.default
+        )
+
+
+# An Expr is a C++ expression.  It has a C++ string representing its syntax,
+# as well as a CType saying what it provides.
+
+
+@dataclass(frozen=True)
+class Expr:
+    expr: str
+    type: NamedCType
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/ufunc.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/ufunc.py
new file mode 100644
index 0000000000000000000000000000000000000000..17adcccecab563b6a4003215c778a00d5e1399c4
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/ufunc.py
@@ -0,0 +1,209 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+import torchgen.api.types as api_types
+from torchgen.api import cpp, structured
+from torchgen.api.types import (
+    ArgName,
+    BaseCppType,
+    BaseCType,
+    Binding,
+    ConstRefCType,
+    CType,
+    NamedCType,
+    scalarT,
+)
+from torchgen.model import (
+    Argument,
+    BaseTy,
+    BaseType,
+    DispatchKey,
+    FunctionSchema,
+    NativeFunctionsGroup,
+    Type,
+)
+
+
+def schema_kernel_name(func: FunctionSchema, dispatch_key: DispatchKey) -> str:
+    assert func.is_out_fn(), "ufunc.kernel_name should only be invoked on out schemas"
+    return f"ufunc_{func.name.name}_{dispatch_key}"
+
+
+def kernel_name(g: NativeFunctionsGroup, dispatch_key: DispatchKey) -> str:
+    return schema_kernel_name(g.out.func, dispatch_key)
+
+
+# Tensors are omitted (as they are stored in TensorIterator), everything else is
+# passed along  (technically, we can pass tensors along too, it just wastes
+# argument registers)
+#
+# NB: used for CPU only
+def dispatchstub_type(t: Type, *, binds: ArgName) -> NamedCType | None:
+    # Dispatch stubs are always plain ints
+    r = cpp.valuetype_type(t, binds=binds, symint=False)
+    if r is not None:
+        return r
+
+    if t == BaseType(BaseTy.Scalar):
+        return NamedCType(binds, ConstRefCType(BaseCType(scalarT)))
+    elif t == BaseType(BaseTy.Tensor):
+        return None
+    else:
+        raise AssertionError(f"unrecognized type {repr(t)}")
+
+
+def opmath_type(scalar_t: BaseCppType) -> BaseCppType:
+    if scalar_t == api_types.scalar_t:
+        return api_types.opmath_t
+    raise NotImplementedError
+
+
+# NB: Tensors in constructor are stored in opmath_t, not scalar_t
+# because Tensor in constructor = its a scalar tensor partially applied =
+# it can be higher precision and we want to compute in that higher precision
+#
+# NB: CUDA only
+def ufunctor_ctor_type(t: Type, *, binds: ArgName, scalar_t: BaseCppType) -> NamedCType:
+    r = cpp.valuetype_type(t, binds=binds, symint=False)
+    if r is not None:
+        return r
+
+    if t == BaseType(BaseTy.Scalar):
+        return NamedCType(binds, BaseCType(opmath_type(scalar_t)))
+    elif t == BaseType(BaseTy.Tensor):
+        return NamedCType(binds, BaseCType(opmath_type(scalar_t)))
+    else:
+        raise AssertionError(f"unrecognized type {repr(t)}")
+
+
+# Only Tensors ever get passed directly to operator()
+#
+# NB: CUDA only
+# (Actually, this works for CPU too)
+def ufunctor_apply_type(
+    t: Type, *, binds: ArgName, scalar_t: BaseCppType
+) -> NamedCType:
+    if t == BaseType(BaseTy.Tensor):
+        return NamedCType(binds, BaseCType(scalar_t))
+    else:
+        raise AssertionError(f"unrecognized type {repr(t)}")
+
+
+# The actual ufunc template function the user writes.  Everything here
+# is done in the computation type.  compute_t is opmath_t in CUDA and scalar_t
+# in CPU
+def ufunc_type(t: Type, *, binds: ArgName, compute_t: CType) -> NamedCType:
+    r = cpp.valuetype_type(t, binds=binds, symint=False)
+    if r is not None:
+        return r
+
+    if t == BaseType(BaseTy.Scalar):
+        return NamedCType(binds, compute_t)
+    elif t == BaseType(BaseTy.Tensor):
+        return NamedCType(binds, compute_t)
+    else:
+        raise AssertionError(f"unrecognized type {repr(t)}")
+
+
+def ufunctor_ctor_argument(a: Argument, scalar_t: BaseCppType) -> Binding:
+    return Binding(
+        nctype=ufunctor_ctor_type(a.type, binds=a.name, scalar_t=scalar_t),
+        name=a.name,
+        default=None,
+        argument=a,
+    )
+
+
+def ufunctor_apply_argument(a: Argument, scalar_t: BaseCppType) -> Binding:
+    return Binding(
+        nctype=ufunctor_apply_type(a.type, binds=a.name, scalar_t=scalar_t),
+        name=a.name,
+        default=None,
+        argument=a,
+    )
+
+
+def ufunc_argument(a: Argument, compute_t: CType) -> Binding:
+    return Binding(
+        nctype=ufunc_type(a.type, binds=a.name, compute_t=compute_t),
+        name=a.name,
+        default=None,
+        argument=a,
+    )
+
+
+@dataclass(frozen=True)
+class UfunctorBindings:
+    ctor: list[Binding]
+    apply: list[Binding]
+
+
+# ufunctors are a CUDA-only concept representing functors that take some of
+# their arguments on a host-side constructor, and the rest in the device-side
+# apply.  E.g.,
+#
+# template <typename scalar_t>
+# struct CUDAFunctorOnSelf_add {
+#   using opmath_t = at::opmath_type<scalar_t>;
+#   opmath_t other_;
+#   opmath_t alpha_;
+#   CUDAFunctorOnSelf_add(opmath_t other, opmath_t alpha) : other_(other), alpha_(alpha) {}
+#   __device__ scalar_t operator()(scalar_t self) {
+#     return ufunc::add(static_cast<opmath_t>(self), other_, alpha_);
+#   }
+# };
+#
+# The ctor refers to the constructor CUDAFunctorOnSelf_add, while apply refers
+# to the operator() definition
+def ufunctor_arguments(
+    g: NativeFunctionsGroup, *, scalar_tensor_idx: int | None, scalar_t: BaseCppType
+) -> UfunctorBindings:
+    ctor = []
+    apply = []
+    for a in g.functional.func.arguments.flat_non_out:
+        if a.type.is_tensor_like():
+            if scalar_tensor_idx == 0:
+                # put it in the ctor anyway
+                ctor.append(ufunctor_ctor_argument(a, scalar_t=scalar_t))
+                scalar_tensor_idx = None
+            else:
+                if scalar_tensor_idx is not None:
+                    scalar_tensor_idx -= 1
+                apply.append(ufunctor_apply_argument(a, scalar_t=scalar_t))
+        else:
+            ctor.append(ufunctor_ctor_argument(a, scalar_t=scalar_t))
+    assert scalar_tensor_idx is None
+    return UfunctorBindings(ctor=ctor, apply=apply)
+
+
+# ufuncs are the inner loop template functions that you wrote in ufunc/add.h
+# which do the actual computation in question.  E.g.,
+#
+# template <typename T>
+# C10_HOST_DEVICE T add(T self, T other, T alpha) __ubsan_ignore_undefined__ {
+#   return self + alpha * other;
+# }
+#
+# In this file, we refer to T as compute_t which is bound by caller
+def ufunc_arguments(g: NativeFunctionsGroup, *, compute_t: CType) -> list[Binding]:
+    return [
+        ufunc_argument(a, compute_t=compute_t)
+        for a in g.functional.func.arguments.flat_non_out
+    ]
+
+
+# Stubs are the DispatchStub trampolines that CPU kernels use to get to their
+# vectorized versions.  E.g.,
+#
+# using structured_binary_fn_alpha = void(*)(TensorIteratorBase&, const Scalar& alpha);
+# DECLARE_DISPATCH(structured_binary_fn_alpha, add_stub);
+def stub_arguments(g: NativeFunctionsGroup) -> list[Binding]:
+    # stubs drop all tensor arguments (they are implicit in the TensorIterator
+    # argument and keep everything else)
+    return [
+        r
+        for a in g.out.func.arguments.flat_non_out
+        if not a.type.is_tensor_like()
+        for r in structured.argument(a)
+    ]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/unboxing.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/unboxing.py
new file mode 100644
index 0000000000000000000000000000000000000000..edb48ec5d172a7063b4003536506ed33f0f293fa
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/api/unboxing.py
@@ -0,0 +1,241 @@
+from __future__ import annotations
+
+from torchgen.api import cpp
+from torchgen.api.types import Binding, CppSignatureGroup, CType
+from torchgen.model import (
+    Argument,
+    BaseTy,
+    BaseType,
+    ListType,
+    NativeFunction,
+    OptionalType,
+    Type,
+)
+
+
+# This file generates the code for unboxing wrappers, i.e., the glue logic to unbox a boxed operator and convert the
+# ivalues from stack to correct arguments to the unboxed kernel, based on corresponding JIT schema. This codegen is
+# an alternative way to generate unboxing wrappers similar to the existing C++ metaprogramming approach but gets the
+# job done statically. These generated unboxing wrappers will be useful under the scenario where we need to register
+# a fixed set of operators known at compile time and thus can save some time in runtime initialization phase.
+#
+# Here's an example on how the codegen works:
+#
+# - Function Schema (source of truth)
+#
+#      aten::empty.names(int[] size, *, Dimname[]? names,
+#                        ScalarType? dtype=None, Layout? layout=None,
+#                        Device? device=None, bool? pin_memory=None,
+#                        MemoryFormat? memory_format=None) -> Tensor
+# - Argument Conversion
+#       Generates C++ code to convert an ivalue (from stack) to its underlying C++ type.
+#    - int[] size
+#        ```cpp
+#           const c10::List<c10::IValue> size_list_in = (std::move(peek(stack, 0, 7))).toList();
+#
+#           std::vector<int64_t> size_vec;
+#           for (c10::IValue size_elem: size_list_in) {
+#               int64_t size_base = size_elem.to<int64_t>();
+#               size_vec.push_back(size_base);
+#           }
+#           at::ArrayRef<int64_t> size_list_out(size_vec);
+#                                 ~~~~~~~~~~~~~ <-- The converted argument from ivalues in the stack.
+#                                                   Will be passed to unboxed kernel.
+#       ```
+#    - Dimname[]? names
+#       ```cpp
+#           ::std::optional<c10::IValue> names_opt = (std::move(peek(stack, 1, 7))).toOptional<c10::IValue>();
+#           ::std::optional<at::ArrayRef<at::Dimname>> names_opt_out;
+#           if (names_opt.has_value()) {
+#                         ~~~~~~~~~~~ <-- Unwrapping optional shell
+#               const c10::IValue names_opt_in = names_opt.value();
+#               const c10::List<c10::IValue> names_list_in = names_opt_in.toList();
+#
+#               std::vector<at::Dimname> names_vec;
+#               for (c10::IValue names_elem: names_list_in) {
+#                                ~~~~~~~~~~~~~~~~~~~~~~~~~ <-- Unrolling list, then convert elements one by one.
+#                   at::Dimname names_base = names_elem.to<at::Dimname>();
+#                   names_vec.push_back(names_base);
+#               }
+#               at::ArrayRef<at::Dimname> names_list_out(names_vec);
+#
+#               names_opt_out = ::std::optional<at::ArrayRef<at::Dimname>>(names_list_out);
+#           } else {
+#               names_opt_out = ::std::optional<at::ArrayRef<at::Dimname>>();
+#           }
+#       ```
+#    - ScalarType? dtype (similarly for the rest of the arguments)
+#       ```cpp
+#           ::std::optional<c10::IValue> dtype_opt = (std::move(peek(stack, 2, 7))).toOptional<c10::IValue>();
+#           ::std::optional<at::ScalarType> dtype_opt_out;
+#           if (dtype_opt.has_value()) {
+#               const c10::IValue dtype_opt_in = dtype_opt.value();
+#               at::ScalarType dtype_base = dtype_opt_in.to<at::ScalarType>();
+#                                                        ~~~~~~~~~~~~~~~~~~~~ <-- For base types, convert ivalue to it
+#                                                                                 directly using ".to<T>()" API.
+#               dtype_opt_out = ::std::optional<at::ScalarType>(dtype_base);
+#           } else {
+#               dtype_opt_out = ::std::optional<at::ScalarType>();
+#           }
+#       ```
+#
+# - Unboxed Kernel Call
+#   ```cpp
+#       auto result_ = torch::empty(
+#           size_list_out,
+#           names_opt_out,
+#           options,
+#           memory_format_opt_out
+#       );
+#   ```
+#
+# - Push Result Back to Stack
+#   ```cpp
+#       drop(stack, 7);
+#       pack(stack, std::move(result_));
+#   ```
+connector = "\n\t"
+
+
+# Return unboxing function name for a NativeFunction
+def name(f: NativeFunction) -> str:
+    return f.func.name.unambiguous_name()
+
+
+# Convert all the arguments in a NativeFunction to C++ code
+def convert_arguments(f: NativeFunction) -> tuple[list[Binding], list[str]]:
+    # we need the 'self' argument so method needs to be False
+    args = (
+        CppSignatureGroup.from_native_function(f, method=False)
+        .most_faithful_signature()
+        .arguments()
+    )
+    code_list = [
+        f"c10::IValue {args[i].name} = std::move(peek(stack, {i}, {len(args)}));"
+        for i in range(len(args))
+    ] + [""]
+    binding_list = []
+    for arg in args:
+        # expecting only Argument
+        if not isinstance(arg.argument, Argument):
+            raise Exception(  # noqa: TRY002
+                f"Unexpected argument type, expecting `Argument` but got {arg}"
+            )
+        argument: Argument = arg.argument
+        unboxed_name, _, code, decl = argumenttype_ivalue_convert(
+            argument.type,
+            argument.name,
+            mutable=argument.is_write,
+        )
+        code_list.extend(decl)
+        code_list.extend(code)
+        binding_list.append(arg.with_name(unboxed_name))
+    return binding_list, code_list
+
+
+# Takes in the type, name and mutability corresponding to an argument, and generates a tuple of:
+# (1) the C++ code necessary to unbox the argument
+# (2) A Binding corresponding to the newly created unboxed variable, including variable name and its CType
+def argumenttype_ivalue_convert(
+    t: Type, arg_name: str, *, mutable: bool = False
+) -> tuple[str, CType, list[str], list[str]]:
+    # Unboxing is for mobile, which doesn't care about SymInts
+    ctype = cpp.argumenttype_type(
+        t=t, mutable=mutable, binds=arg_name, symint=False
+    ).type
+
+    if isinstance(t, BaseType):
+        out_name = f"{arg_name}_base"
+        code, decl = _gen_code_base_type(
+            arg_name=arg_name, out_name=out_name, ctype=ctype
+        )
+    elif isinstance(t, OptionalType):
+        out_name = f"{arg_name}_opt_out"
+        code, decl = _gen_code_optional_type(
+            arg_name=arg_name,
+            out_name=out_name,
+            t=t,
+            ctype=ctype,
+        )
+    elif isinstance(t, ListType):
+        out_name = f"{arg_name}_list_out"
+        code, decl = _gen_code_list_type(
+            arg_name=arg_name,
+            out_name=out_name,
+            t=t,
+            ctype=ctype,
+        )
+    else:
+        raise Exception(f"Cannot handle type {t}. arg_name: {arg_name}")  # noqa: TRY002
+    return out_name, ctype, code, decl
+
+
+def _gen_code_base_type(
+    arg_name: str, out_name: str, ctype: CType
+) -> tuple[list[str], list[str]]:
+    return [
+        f"{ctype.cpp_type(strip_ref=True)} {out_name} = {arg_name}.to<{ctype.cpp_type(strip_ref=True)}>();"
+    ], []
+
+
+def _gen_code_optional_type(
+    arg_name: str, out_name: str, t: OptionalType, ctype: CType
+) -> tuple[list[str], list[str]]:
+    in_name = f"{arg_name}_opt_in"
+    res_name, _, res_code, decl = argumenttype_ivalue_convert(t.elem, in_name)
+    return (
+        f"""
+auto {arg_name}_opt = {arg_name}.toOptional<c10::IValue>();
+{ctype.cpp_type(strip_ref=True)} {out_name};
+if ({arg_name}_opt.has_value()) {{
+    const c10::IValue {in_name} = {arg_name}_opt.value();
+    {connector.join(res_code)}
+    {out_name} = {ctype.cpp_type(strip_ref=True)}({res_name});
+}} else {{
+    {out_name} = {ctype.cpp_type(strip_ref=True)}();
+}}
+        """.split("\n"),
+        decl,
+    )
+
+
+def _gen_code_list_type(
+    arg_name: str, out_name: str, t: ListType, ctype: CType
+) -> tuple[list[str], list[str]]:
+    in_name = f"{arg_name}_list_in"
+    elem_name = f"{arg_name}_elem"
+    code = [f"const c10::List<c10::IValue> {in_name} = {arg_name}.toList();"]
+    res_name, res_ctype, res_code, decl = argumenttype_ivalue_convert(t.elem, elem_name)
+    # handle list type with size, e.g., bool[4]
+    if isinstance(t.elem, BaseType) and t.elem.name == BaseTy.bool and t.size:
+        code.extend(
+            f"""
+{ctype.cpp_type(strip_ref=True)} {out_name} = as_array<{res_ctype.cpp_type(strip_ref=True)}, {t.size}>({in_name});
+            """.split("\n")
+        )
+    # we have to use c10::List for optional element. e.g., Tensor?[] -> c10::List<::std::optional<at::Tensor>>
+    elif isinstance(t.elem, OptionalType):
+        code.extend(
+            f"""
+{ctype.cpp_type(strip_ref=True)} {out_name};
+for (c10::IValue {elem_name}: {in_name}) {{
+    {connector.join(res_code)}
+    {out_name}.push_back({res_name});
+}}
+            """.split("\n")
+        )
+    else:
+        # use ArrayRef as default.
+        vec_name = arg_name + "_vec"
+        # need to bring vector instantiation out of scope so that ArrayRef has valid data
+        decl.append(f"std::vector<{res_ctype.cpp_type(strip_ref=True)}> {vec_name};")
+        code.extend(
+            f"""
+for (c10::IValue {elem_name}: {in_name}) {{
+    {connector.join(res_code)}
+    {vec_name}.push_back({res_name});
+}}
+{ctype.cpp_type(strip_ref=True)} {out_name}({vec_name});
+            """.split("\n")
+        )
+    return code, decl
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..8f08a743ae2dc766530fd8f93be9ebb8b7733f21
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__init__.py
@@ -0,0 +1,19 @@
+from torchgen.dest.lazy_ir import (
+    generate_non_native_lazy_ir_nodes as generate_non_native_lazy_ir_nodes,
+    GenLazyIR as GenLazyIR,
+    GenLazyNativeFuncDefinition as GenLazyNativeFuncDefinition,
+    GenLazyShapeInferenceDefinition as GenLazyShapeInferenceDefinition,
+)
+from torchgen.dest.native_functions import (
+    compute_native_function_declaration as compute_native_function_declaration,
+)
+from torchgen.dest.register_dispatch_key import (
+    gen_registration_headers as gen_registration_headers,
+    gen_registration_helpers as gen_registration_helpers,
+    RegisterDispatchKey as RegisterDispatchKey,
+)
+from torchgen.dest.ufunc import (
+    compute_ufunc_cpu as compute_ufunc_cpu,
+    compute_ufunc_cpu_kernel as compute_ufunc_cpu_kernel,
+    compute_ufunc_cuda as compute_ufunc_cuda,
+)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..9a14bb3f2be2dc4abfcb2d4bc3df3047d231905f
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/lazy_ir.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/lazy_ir.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a67931ea9f7095f0d17e01d0c5ef2b932ab5676c
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/lazy_ir.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/lazy_ts_lowering.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/lazy_ts_lowering.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..d90369a302074b33ff2b00b8a9bd6bbc6f307d0e
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/lazy_ts_lowering.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/native_functions.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/native_functions.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e5556424df1e9db69ece73002528f2c033416c90
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/native_functions.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/register_dispatch_key.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/register_dispatch_key.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5409924a16f8092a75b084cc4f75e2b903d90ad9
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/register_dispatch_key.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/ufunc.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/ufunc.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6e6959f8814d0df5ae8e5a67e964c74f63bc383e
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/__pycache__/ufunc.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/lazy_ir.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/lazy_ir.py
new file mode 100644
index 0000000000000000000000000000000000000000..b912b8f2427f8848b1a65736f9b36b71b85c06ad
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/lazy_ir.py
@@ -0,0 +1,707 @@
+from __future__ import annotations
+
+import itertools
+from abc import ABC
+from dataclasses import dataclass
+from typing import Any
+
+import torchgen.api.dispatcher as dispatcher
+from torchgen.api.lazy import (
+    getValueT,
+    isValueType,
+    LazyArgument,
+    LazyIrProperties,
+    LazyIrSchema,
+    tensorListValueT,
+)
+from torchgen.api.translate import translate
+from torchgen.api.types import (
+    BaseCType,
+    Binding,
+    deviceT,
+    DispatcherSignature,
+    kernel_signature,
+    NativeSignature,
+    OptionalCType,
+    VectorCType,
+)
+from torchgen.context import method_with_native_function
+from torchgen.dest.lazy_ts_lowering import ts_lowering_body
+from torchgen.model import (
+    Argument,
+    BackendIndex,
+    BackendMetadata,
+    BaseTy,
+    BaseType,
+    FunctionSchema,
+    ListType,
+    NativeFunction,
+    NativeFunctionsGroup,
+)
+
+
+def node_ctor_arg_rvalue_string(arg: LazyArgument) -> str:
+    """
+    Given a LazyArgument,
+    generate a c++ string for materializing an rvalue of that arg for passing into
+    a lazy Node constructor.
+    """
+
+    # TODO: Matching on CType seems wrong; should be matching on Type
+    if isValueType(arg.lazy_type):
+        if isinstance(arg.lazy_type, BaseCType):
+            if arg.is_wrapped_scalar:
+                return f"node_{arg.name}"
+            elif arg.lazy_type.type is tensorListValueT:
+                return f"lazy_{arg.name}_tensorlist"
+            elif arg.is_symint_or_list:
+                return f"GetSymIntValue({arg.name})"
+            return f"lazy_{arg.name}->GetIrValue()"
+        elif isinstance(arg.lazy_type, OptionalCType):
+            if arg.is_symint_or_list:
+                # TODO: I don't understand when you should put lazy_ in the name
+                # or not
+                return f"{arg.name} ? std::make_optional(GetSymIntValue(*{arg.name})) : ::std::nullopt"
+            elif arg.is_wrapped_scalar:
+                return f"node_{arg.name}"
+            return (
+                f"lazy_{arg.name} ? "
+                f"std::make_optional(lazy_{arg.name}->GetIrValue()) : "
+                "::std::nullopt"
+            )
+        else:
+            raise AssertionError(
+                f"TODO not sure if there are other valid types to handle here ({arg.lazy_type})"
+            )
+    else:
+        # NB: this is here because right now we aren't treating SymInt[] as a
+        # value type; when we do this needs to move above
+        # NB: we cannot test arg.lazy_type as we've already specified it is an
+        # int64_t and so we cannot distinguish between SymInt and int64_t
+        if isinstance(arg.orig_type, ListType) and arg.orig_type.elem == BaseType(
+            BaseTy.SymInt
+        ):
+            if arg.symint:
+                return f"GetSymIntArrayRefValue({arg.name})"
+            else:
+                return f"std::vector<int64_t>({arg.name}.begin(), {arg.name}.end())"
+        elif isinstance(arg.lazy_type, VectorCType) and isinstance(
+            arg.lazy_type.elem, BaseCType
+        ):
+            return f"std::vector<{arg.lazy_type.elem.type}>({arg.name}.begin(), {arg.name}.end())"
+        elif (
+            isinstance(arg.lazy_type, OptionalCType)
+            and isinstance(arg.lazy_type.elem, VectorCType)
+            and isinstance(arg.lazy_type.elem.elem, BaseCType)
+        ):
+            return f"torch::lazy::ToOptionalVector<{arg.lazy_type.elem.elem.type}>({arg.name})"
+        else:
+            return f"{arg.name}"
+
+
+def node_ctor_inputs(schema: LazyIrSchema) -> str:
+    """
+    Produce a formatted string with the arguments as passed into the constructor of a node class.
+    """
+    node_ctor_values = [
+        node_ctor_arg_rvalue_string(arg) for arg in schema.filtered_args()
+    ]
+    return ", ".join(node_ctor_values)
+
+
+def gen_fallback_code(
+    schema: LazyIrSchema,
+    sig: DispatcherSignature | NativeSignature,
+    overload_name: str,
+) -> str:
+    """
+    Generate code that falls back to eager conditioned on a predicate
+    """
+    dispatcher_sig = DispatcherSignature.from_schema(schema.func)
+    exprs = translate(sig.arguments(), dispatcher_sig.arguments())
+    fallback_args = ",\n                ".join([a.expr for a in exprs])
+    if len(overload_name):
+        aten_op_str = f"ATEN_OP2({schema.aten_name}, {overload_name})"
+    else:
+        aten_op_str = f"ATEN_OP({schema.aten_name})"
+    return f"""
+        if (force_eager_fallback({aten_symbol(schema)})) {{
+            return at::native::call_fallback_fn_symint<&ltc_eager_fallback, {aten_op_str}>::call(
+                {fallback_args}
+            );
+        }}
+"""
+
+
+def aten_symbol(schema: LazyIrSchema) -> str:
+    missing_interned_strings = {
+        "sigmoid_backward",
+    }
+    if schema.aten_name in missing_interned_strings:
+        return f'c10::Symbol::fromQualString("aten::{schema.aten_name}")'
+
+    if not schema.aten_name.startswith("at::"):
+        return f"at::aten::{schema.aten_name}"
+    else:
+        return schema.aten_name
+
+
+# converts  all tensor-like arguments to meta tensors. Returns:
+# (1) a string containing all of the logic that does the conversions.
+# (2) a context, to be used by translate(), with all of the relevant bindings.
+def convert_to_meta_tensors(sig: DispatcherSignature) -> tuple[str, list[Binding]]:
+    context: list[Binding] = []
+    unwrapped_tensor_args: list[str] = []
+    for arg in sig.arguments():
+        if isinstance(arg.argument, Argument) and arg.argument.type.is_tensor_like():
+            unwrapped_name = f"{arg.name}_meta"
+            unwrapped_tensor_args.append(
+                f"auto {unwrapped_name} = to_meta({arg.name});"
+            )
+            context.append(arg.with_name(unwrapped_name))
+        else:
+            context.append(arg)
+    unwrap_tensor_args_str = "\n        ".join(unwrapped_tensor_args)
+    return unwrap_tensor_args_str, context
+
+
+@dataclass(frozen=True)
+class GenLazyIR(ABC):
+    backend_index: BackendIndex
+    backend_name: str
+    node_base: str
+    use_lazy_shape: bool
+
+    @method_with_native_function
+    def __call__(self, f: NativeFunctionsGroup | NativeFunction) -> list[str]:
+        func = f.functional.func if isinstance(f, NativeFunctionsGroup) else f.func
+        metadata = self.backend_index.get_kernel(
+            f.functional if isinstance(f, NativeFunctionsGroup) else f
+        )
+        schema = LazyIrSchema(
+            func, symint=metadata is not None and metadata.supports_symint()
+        )
+        return self.gen(schema)
+
+    # there is no lowering functionality generated unless this IR base class is subclassed and
+    # implemented as a backend-specific node
+    def lowering_function(self, schema: LazyIrSchema) -> str:
+        return ""
+
+    def create_function(self, schema: LazyIrSchema, node_ctor_args: str) -> str:
+        return ""
+
+    def can_be_reused_function(self, schema: LazyIrSchema, node_ctor_args: str) -> str:
+        return f"""bool CanBeReused({node_ctor_args}) const {{
+    return false;
+    }}"""
+
+    def node_base_ctor_call(self, schema: LazyIrSchema) -> str:
+        value_args = schema.filtered_args(values=True, scalars=False)
+        # backends can customize the way the node base class constructor is called,
+        # as long as all of its arguments can be generated from information available from the schema
+        base_ctor_value_args_list = []
+        for arg in value_args:
+            if isinstance(arg.lazy_type, (BaseCType, VectorCType)):
+                base_ctor_value_args_list.append(f"{arg.name}")
+            elif isinstance(arg.lazy_type, OptionalCType):
+                base_ctor_value_args_list.append(f"{arg.name}.value_or(kNullValue)")
+            else:
+                raise AssertionError(
+                    f"Unsupported type ({arg.lazy_type}) - add support if necessary"
+                )
+        base_ctor_value_args = ", ".join(base_ctor_value_args_list)
+
+        scalar_args = schema.filtered_args(values=False, scalars=True)
+
+        # Shape construction.
+        # Conditionally build shape depending on specified shape property
+        if schema.properties.ShapePrecompute:
+            shape_ctor_arg = "std::move(shapes),"
+        elif schema.properties.ShapeCompute:
+            shape_args = [a.name for a in value_args]
+            shape_args.extend(a.name for a in scalar_args)
+            shape_ctor_arg = f"compute_shape_{schema.name}({', '.join(shape_args)}),"
+        elif schema.properties.ShapeCache:
+            shape_args = [f"operand({i})" for i in range(len(value_args))]
+            shape_args.extend(a.name for a in scalar_args)
+            shape_ctor_arg = f"[&](){{ return compute_shape_{schema.name}({', '.join(shape_args)})[0]; }},"
+        else:
+            shape_ctor_arg = ""
+
+        scalar_hashes = ", ".join(f"{a.name}" for a in scalar_args)
+
+        return f"""{self.node_base}(
+              {schema.node_name}::ClassOpKind(),
+              OpList{{{base_ctor_value_args}}},
+              {shape_ctor_arg}
+              /* num_outputs */ {len(schema.returns)},
+              torch::lazy::MHash({scalar_hashes}))"""
+
+    def gen(self, schema: LazyIrSchema) -> list[str]:
+        opkind = schema.opkind or aten_symbol(schema)
+
+        # for now, we just want one IR class decl and soon after also the method defs
+        # and we use the functional version not out/inplace.
+        all_args = schema.filtered_args()
+        scalar_args = schema.filtered_args(values=False, scalars=True)
+
+        ctor_args = [f"const {i.lazy_type.cpp_type()}& {i.name}" for i in all_args]
+        reuse_ctor_args = ", ".join(ctor_args)
+        if self.use_lazy_shape and schema.properties.ShapePrecompute:
+            ctor_args.append("std::vector<torch::lazy::Shape>&& shapes")
+        node_ctor_args = ", ".join(ctor_args)
+
+        scalar_initializers = ",\n        ".join(
+            [
+                # This code is just special casing the mapping from string_view -> strings
+                f"{a.name}({a.name}.has_value() ? ::std::make_optional(std::string(*{a.name})) : ::std::nullopt)"
+                if a.lazy_type.cpp_type() == "::std::optional<c10::string_view>"
+                else f"{a.name}({a.name})"
+                for a in scalar_args
+            ]
+        )
+        if len(scalar_initializers):
+            scalar_initializers = f",\n        {scalar_initializers}"
+        scalar_decls = "\n  ".join(
+            [
+                f"std::string {a.name};"
+                if a.lazy_type.cpp_type() == "c10::string_view"
+                else f"::std::optional<std::string> {a.name};"
+                if a.lazy_type.cpp_type() == "::std::optional<c10::string_view>"
+                else f"{a.lazy_type.cpp_type()} {a.name};"
+                for a in scalar_args
+            ]
+        )
+        optional_values = [
+            arg.name
+            for arg in schema.filtered_args(values=True, scalars=False)
+            if isinstance(arg.lazy_type, OptionalCType)
+        ]
+        has_optional_decls = "\n  ".join(
+            [f"bool has_{value}: 1;" for value in optional_values]
+        )
+        has_optional_defs = "\n    ".join(
+            [f"has_{value} = !!{value};" for value in optional_values]
+        )
+        members_to_string = []
+        for arg in scalar_args:
+            if isinstance(arg.lazy_type, OptionalCType):
+                value = f"{arg.name}.value()"
+                if arg.is_generator:
+                    value = '"torch.Generator()"'
+                members_to_string.append(
+                    f"""if ({arg.name}.has_value()) {{
+      ss << ", {arg.name}=" << {value};
+    }} else {{
+      ss << ", {arg.name}=null";
+    }}"""
+                )
+            else:
+                members_to_string.append(f'ss << ", {arg.name}=" << {arg.name};')
+        members_to_string_str = "\n    ".join(members_to_string)
+
+        return [
+            f"""\
+class {schema.node_name} : public {self.node_base} {{
+ public:
+  static torch::lazy::OpKind ClassOpKind() {{
+    return torch::lazy::OpKind({opkind});
+  }}
+
+  {schema.node_name}({node_ctor_args})
+      : {self.node_base_ctor_call(schema)}{scalar_initializers}
+  {{
+    {has_optional_defs}
+  }}
+
+  std::string ToString() const override {{
+    std::stringstream ss;
+    ss << {self.node_base}::ToString();
+    {members_to_string_str}
+    return ss.str();
+  }}
+
+  {self.create_function(schema, reuse_ctor_args)}
+
+  {self.can_be_reused_function(schema, reuse_ctor_args)}
+
+  {self.lowering_function(schema)}
+
+  {scalar_decls}
+  {has_optional_decls}
+
+}};
+
+""",
+        ]
+
+
+@dataclass(frozen=True)
+class GenTSLazyIR(GenLazyIR):
+    def lowering_function(self, schema: LazyIrSchema) -> str:
+        signature = """
+  torch::lazy::TSOpVector Lower(
+      std::shared_ptr<torch::jit::GraphFunction> function,
+      torch::lazy::TSLoweringContext* loctx) const override"""
+
+        if schema.properties.LowerDeclOnly:
+            return f"{signature};"
+        elif schema.properties.Lower:
+            return f"""{signature} {{
+    {ts_lowering_body(schema)}
+  }}
+            """
+        else:
+            return ""
+
+    def create_function(self, schema: LazyIrSchema, node_ctor_args: str) -> str:
+        signature = f"static NodePtr Create({node_ctor_args})"
+        if schema.properties.CreateFnDeclOnly:
+            return f"{signature};"
+        elif not schema.properties.CreateFn:
+            return ""
+        return f"""{signature} {{
+    return ReuseOrMakeNode<{schema.node_name}>(data);
+  }}"""
+
+    def can_be_reused_function(self, schema: LazyIrSchema, node_ctor_args: str) -> str:
+        signature = f"bool CanBeReused({node_ctor_args}) const"
+        if schema.properties.CanBeReusedDeclOnly:
+            return f"{signature};"
+        elif not schema.properties.CanBeReused:
+            return ""
+        value_comparison = []
+        for arg in itertools.chain(schema.positional_values, schema.keyword_values):
+            if isinstance(arg.lazy_type, OptionalCType):
+                value_comparison.append(
+                    f"nullable_operand(i++) == {arg.name}.value_or(kNullValue)"
+                )
+            else:
+                value_comparison.append(f"operand(i++) == {arg.name}")
+        for arg in itertools.chain(schema.positional_scalars, schema.keyword_scalars):
+            if isinstance(arg.lazy_type, OptionalCType):
+                value_comparison.append(
+                    f"((!this->{arg.name}&&!{arg.name}) || (this->{arg.name}&&{arg.name} && *(this->{arg.name}) == *{arg.name}))"
+                )
+            else:
+                value_comparison.append(f"this->{arg.name} == {arg.name}")
+        value_comparison_str = " &&\n        ".join(value_comparison)
+
+        return f"""{signature} {{
+    size_t i = 0;
+    return ({value_comparison_str});
+  }}"""
+
+
+@dataclass(frozen=True)
+class GenLazyNativeFuncDefinition:
+    class_method_name: str
+    backend_index: BackendIndex
+    tensor_class: str
+    gen_forced_fallback_code: bool
+    backend_namespace: str
+    get_tensorlist: str
+    get_tensor_or_wrap_number: str
+    try_get_tensor: str
+    metrics_counter: str
+    create_tensor: str
+    create_from_first_tensor: bool
+    create_aten_from_ltc_tensor: str
+    tuple_aten_from_ltc_tensors: str
+    lazy_tensor_ptr: str
+    get_device_fn: str
+
+    def lazy_tensor_decls(self, func: NativeFunction, schema: LazyIrSchema) -> str:
+        value_args = schema.filtered_args(values=True, scalars=False)
+        # Generates lazy_{name} variables for LazyTensors wrapping input tensors
+        lazy_tensor_decls: list[str] = []
+        for arg in value_args:
+            if arg.is_wrapped_scalar:
+                if isinstance(arg.lazy_type, OptionalCType):
+                    lazy_tensor_decls.append(
+                        f"""auto node_{arg.name} = {arg.name} ?
+                std::make_optional(torch::lazy::LazyGraphExecutor::Get()->
+                    GetIrValueForScalarFromCodegen(*{arg.name}, *common_device)):
+                ::std::nullopt;"""
+                    )
+                else:
+                    lazy_tensor_decls.append(
+                        f"""auto node_{arg.name} = torch::lazy::LazyGraphExecutor::Get()->
+                            GetIrValueForScalarFromCodegen({arg.name}, *common_device);"""
+                    )
+            elif arg.is_symint_or_list:
+                continue  # values are extracted in isValueType
+            elif isinstance(arg.lazy_type, BaseCType):
+                if arg.lazy_type.type is tensorListValueT:
+                    lazy_tensor_decls.append(
+                        f"auto lazy_{arg.name}_tensorlist = "
+                        f"{self.backend_namespace}::{self.get_tensorlist}({arg.name});"
+                    )
+                else:
+                    lazy_tensor_decls.append(
+                        f"{self.lazy_tensor_ptr} lazy_{arg.name} = "
+                        f"{self.backend_namespace}::{self.get_tensor_or_wrap_number}({arg.name}, *common_device);"
+                    )
+            elif isinstance(arg.lazy_type, OptionalCType):
+                assert arg.lazy_type.elem == BaseCType(getValueT()), arg.lazy_type.elem
+                # TODO(alanwaketan): Maybe we want to apply GetLtcTensorOrCreateForWrappedNumber here, but hold it
+                # until we encounter a real world example.
+                lazy_tensor_decls.append(
+                    f"{self.lazy_tensor_ptr} lazy_{arg.name} = "
+                    f"{self.backend_namespace}::{self.try_get_tensor}({arg.name}.value_or(at::Tensor()));"
+                )
+            else:
+                raise AssertionError(
+                    f"TODO not sure if there are other valid types to handle here ({arg.lazy_type})"
+                )
+        return ("\n        ").join(lazy_tensor_decls)
+
+    def force_eager_fallback(
+        self,
+        func: NativeFunction,
+        schema: LazyIrSchema,
+        metadata: BackendMetadata,
+        sig: DispatcherSignature | NativeSignature,
+    ) -> str:
+        if self.gen_forced_fallback_code:
+            return gen_fallback_code(
+                schema, sig, overload_name=func.func.name.overload_name
+            )
+        return ""
+
+    def metrics(self, func: NativeFunction, schema: LazyIrSchema) -> str:
+        return f"{self.metrics_counter};"
+
+    def get_device(self, func: NativeFunction, schema: LazyIrSchema) -> str:
+        value_args = schema.filtered_args(values=True, scalars=False)
+        scalar_args = schema.filtered_args(values=False, scalars=True)
+        value_types_names = [f"{a.name}" for a in value_args if not a.is_wrapped_scalar]
+        optional_device = OptionalCType(BaseCType(deviceT))
+        optional_devices = [
+            a.name for a in scalar_args if a.lazy_type == optional_device
+        ]
+        assert len(value_types_names) > 0 or len(optional_devices) > 0, (
+            "Expected at least one Value or Device type"
+        )
+        get_device_str = (
+            f"{self.get_device_fn}({', '.join(value_types_names + optional_devices)})"
+        )
+        return f"""auto common_device = {get_device_str};
+        TORCH_INTERNAL_ASSERT(common_device);
+        """
+
+    def shape_inference(self, func: NativeFunction, schema: LazyIrSchema) -> str:
+        metadata = self.backend_index.get_kernel(func)
+        assert metadata is not None
+        all_args = schema.filtered_args()
+        returns_length = len(schema.returns)
+        # call the meta kernel if it exists, to compute output shape/dtype for our IR
+        # Note [Generated LTC Shape Functions]
+        # LTC uses meta tensors from core to do shape inference when possible, and otherwise
+        # we generate a shape function declaration that needs to be manually implemented.
+        # How do we detect which ops are eligible to use meta tensors?
+        # In general we should be able to use meta tensors not just on structured operators,
+        # but also on composite operators that are implemented in terms of structured kernels.
+        # We don't currently have a way of knowing at codegen time which ops are implemented that way.
+        # This is the case for all view and view_copy operators however, so we're going to
+        # use them specifically for all of the view_copy ops (instead of manually writing shape rules for all of them).
+        is_view_copy_op = "view_copy" in func.tags
+        is_structured = func.structured or func.structured_delegate is not None
+        if is_structured or is_view_copy_op:
+            meta_out = """
+std::vector<torch::lazy::Shape> shapes{torch::lazy::Shape(out_meta.scalar_type(), out_meta.sizes().vec())};"""
+            if returns_length > 1:
+
+                def this_shape(i: int) -> str:
+                    return f"torch::lazy::Shape(std::get<{i}>(out_meta).scalar_type(), std::get<{i}>(out_meta).sizes().vec())"
+
+                shapes_str = ",".join([this_shape(i) for i in range(returns_length)])
+                meta_out = "std::vector<torch::lazy::Shape> shapes{" + shapes_str + "};"
+
+            # Convert tensor args to the meta device and call it.
+            # (We can't pass in the input tensors directly, because they are "functional wrappers".
+            # If any of the meta kernels call a tensor op and redispatch, we don't want to hit the functionalize kernels.)
+            # Even at::meta:: functions might redispatch, e.g. if they call into view ops.
+            dispatcher_sig = DispatcherSignature.from_schema(func.func)
+            meta_conversion_str, meta_call_ctx = convert_to_meta_tensors(dispatcher_sig)
+            meta_call_args = [
+                e.expr
+                for e in translate(
+                    meta_call_ctx, dispatcher_sig.arguments(), method=False
+                )
+            ]
+            if is_view_copy_op:
+                # view_copy ops always have a CompositeExplicitAutogradNonFunctional kernel
+                assert func.has_composite_explicit_autograd_non_functional_kernel
+                dispatch_ns = "compositeexplicitautogradnonfunctional"
+            else:
+                dispatch_ns = "meta"
+            aten_name = schema.aten_name
+            # TODO: this is trolling
+            if func.func.has_symint() and metadata.supports_symint():
+                aten_name += "_symint"
+            shape_str = f"""\
+        {meta_conversion_str}
+        auto out_meta = at::{dispatch_ns}::{aten_name}({", ".join(meta_call_args)});
+        {meta_out}"""
+        else:
+            shape_sig = ComputeShapeSignature(
+                metadata.kernel, func, symint=metadata.supports_symint()
+            )
+            shape_str = f"""
+            auto shapes = {shape_sig.shape_call};"""
+
+        shape_str += f"""
+            TORCH_INTERNAL_ASSERT(shapes.size() == {returns_length});"""
+
+        # Calculating which dimensions are symbolic
+        func_schema_str = "aten::" + str(func.func)
+        shape_str += f"""
+            if(torch::lazy::symbolicShapeEnabled()){{
+                std::vector<torch::jit::IValue> inputs = {{ {", ".join(str(a.name) for a in all_args)} }};
+                const char* schema_str = "{func_schema_str}";
+                applySymbolicShapesOnLT(schema_str, inputs, shapes);
+            }}
+        """
+        return shape_str
+
+    def build_ir_node(self, func: NativeFunction, schema: LazyIrSchema) -> str:
+        node_ctor_input_str = node_ctor_inputs(schema)
+        return f"""torch::lazy::NodePtr node = torch::lazy::ReuseNode<{schema.node_name}>({node_ctor_input_str});
+        if (!node) {{
+            {self.shape_inference(func, schema)}
+            node = torch::lazy::MakeNode<{schema.node_name}>({node_ctor_input_str}, std::move(shapes));
+            CacheNode(node);
+        }}
+        """
+
+    def create_lazy_tensor(self, first_tensor_name: str | None = None) -> str:
+        # xla uses an instance method for tensor creation, for the time being
+        if self.create_from_first_tensor:
+            # TODO(whc) remove this if XLA switches to using static method for creation
+            assert first_tensor_name is not None, (
+                "Requires first tensor to create lazy tensor"
+            )
+            return f"{first_tensor_name}.{self.create_tensor}"
+        return f"{self.backend_namespace}::{self.create_tensor}"
+
+    def return_aten_tensor(self, func: NativeFunction, schema: LazyIrSchema) -> str:
+        returns_length = len(schema.returns)
+        value_args = schema.filtered_args(values=True, scalars=False)
+        value_types_names = [f"{a.name}" for a in value_args if not a.is_wrapped_scalar]
+        first_tensor_name = value_types_names[0] if len(value_types_names) > 0 else None
+        bridge_str = f"""auto result = {self.create_aten_from_ltc_tensor}(
+                {self.create_lazy_tensor(first_tensor_name)}(std::move(node), *common_device));"""
+
+        if returns_length > 1:
+            assert len(value_types_names) > 0, (
+                "Code below assumes there is at least one tensor arg"
+            )
+            bridge_str = f"""std::vector<{self.lazy_tensor_ptr}> lazy_tensors;
+        for (int i = 0; i < {returns_length}; i++) {{
+            lazy_tensors.push_back({self.create_lazy_tensor(first_tensor_name)}({getValueT()}(node, i), *common_device));
+        }}
+        auto result = {self.tuple_aten_from_ltc_tensors}<{returns_length}>(lazy_tensors);"""
+
+        if schema.name.name.inplace or func.func.is_out_fn():
+            assert returns_length == 1, (
+                "We assumed there was no such case where an op is an in-place variant "
+                f"and has tuple outputs, but got tuple of len {returns_length}."
+            )
+            bridge_str = f"""lazy_{first_tensor_name}->SetInPlaceIrValue(node);
+        auto& result = {first_tensor_name};"""
+
+        bridge_str += """
+        return result;"""
+        return bridge_str
+
+    @method_with_native_function
+    def __call__(self, func: NativeFunction) -> list[str]:
+        sig = kernel_signature(func, self.backend_index)
+        metadata = self.backend_index.get_kernel(func)
+        assert metadata is not None
+        schema = LazyIrSchema(func.func, symint=metadata.supports_symint())
+        return [
+            f"""\
+    {sig.decl(name=f"{self.class_method_name}::{metadata.kernel}")} {{
+        {self.force_eager_fallback(func, schema, metadata, sig)}
+        {self.metrics(func, schema)}
+        {self.get_device(func, schema)}
+        {self.lazy_tensor_decls(func, schema)}
+        {self.build_ir_node(func, schema)}
+        {self.return_aten_tensor(func, schema)}
+    }}\n
+    """
+        ]
+
+
+class ComputeShapeSignature:
+    """
+    Here we use the base name as the suffix of the signature to avoid generating for in-place variants.
+    """
+
+    def __init__(self, kernel_name: str, f: NativeFunction, *, symint: bool) -> None:
+        self.__schema = LazyIrSchema(f.func, symint=symint)
+        self.__dispatch_args = ", ".join(
+            [a.decl() for a in dispatcher.arguments(f.func, symint=symint)]
+        )
+        self.__call_args = ", ".join(
+            [f"{arg.name}" for arg in self.__schema.filtered_args(generator=True)]
+        )
+        self.__kernel_name = kernel_name
+
+    def __decl_suffix(self) -> str:
+        return f"{self.__kernel_name}({self.__dispatch_args})"
+
+    def __call_suffix(self) -> str:
+        return f"{self.__kernel_name}({self.__call_args})"
+
+    @property
+    def shape_decl(self) -> str:
+        return f"TORCH_API std::vector<torch::lazy::Shape> compute_shape_{self.__decl_suffix()}"
+
+    @property
+    def shape_call(self) -> str:
+        return f"torch::lazy::compute_shape_{self.__call_suffix()}"
+
+
+@dataclass(frozen=True)
+class GenLazyShapeInferenceDefinition:
+    backend_index: BackendIndex
+    tensor_class: str
+
+    @method_with_native_function
+    def __call__(self, f: NativeFunction) -> list[str]:
+        metadata = self.backend_index.get_kernel(f)
+        assert metadata is not None
+
+        # See Note [Generated LTC Shape Functions]
+        is_view_copy_op = "view_copy" in f.tags
+        is_structured = f.structured or f.structured_delegate is not None
+        if is_structured or is_view_copy_op:
+            return []
+        else:
+            shape_sig = ComputeShapeSignature(
+                metadata.kernel, f, symint=metadata.supports_symint()
+            )
+            return ["\n".join([f"{shape_sig.shape_decl};"])]
+
+
+def generate_non_native_lazy_ir_nodes(
+    non_native: list[dict[str, Any]], gen_lazy_ir: GenLazyIR
+) -> list[str]:
+    """Generate the non-native lazy IR node classes"""
+    nodes = []
+    for op in non_native:
+        # Set default properties for Non-Native IRs
+        properties = LazyIrProperties("ShapeCache", "CanBeReused", "LowerDeclOnly")
+        for p in op.get("properties", []):
+            setattr(properties, p, True)
+
+        # non-native is assumed to want symint bindings if you wrote symint
+        schema = LazyIrSchema(FunctionSchema.parse(op["func"]), properties, symint=True)
+        schema.opkind = op.get("opkind")
+        nodes.append(gen_lazy_ir.gen(schema)[0])
+
+    return nodes
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/lazy_ts_lowering.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/lazy_ts_lowering.py
new file mode 100644
index 0000000000000000000000000000000000000000..70161216d8e7c95e194b0d89b345e0da886ef989
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/lazy_ts_lowering.py
@@ -0,0 +1,48 @@
+from torchgen.api.lazy import LazyArgument, LazyIrSchema
+from torchgen.api.types import OptionalCType
+
+
+def ts_lowering_body(schema: LazyIrSchema) -> str:
+    # for now, we just want one IR class decl and soon after also the method defs
+    # and we use the functional version not out/inplace.
+    emplace_arguments = []
+
+    def get_value(arg: LazyArgument) -> str:
+        if isinstance(arg.lazy_type, OptionalCType):
+            return f"has_{arg.name} ? loctx->GetOutputOp(operand(i++)) : nullptr"
+        return "loctx->GetOutputOp(operand(i++))"
+
+    for arg in schema.positional_args:
+        if arg.is_lazy_value:
+            emplace_arguments.append(get_value(arg))
+            continue
+        emplace_arguments.append(f'"{arg.name}", {arg.name}')
+
+    emplace_arguments_str = "\n    ".join(
+        [f"arguments.emplace_back({a});" for a in emplace_arguments]
+    )
+    emplace_kwarg_values = [
+        f'"{arg.name}", {get_value(arg)}' for arg in schema.keyword_values
+    ]
+    emplace_kwarg_scalars = [
+        f'"{arg.name}", {arg.name}' for arg in schema.keyword_scalars
+    ]
+    emplace_kwarguments = "\n    ".join(
+        [
+            f"kwarguments.emplace_back({a});"
+            for a in emplace_kwarg_values + emplace_kwarg_scalars
+        ]
+    )
+    return f"""\
+    std::vector<torch::jit::NamedValue> arguments;
+    std::vector<torch::jit::NamedValue> kwarguments;
+    arguments.reserve({len(emplace_arguments)});
+    kwarguments.reserve({len(emplace_kwarg_values + emplace_kwarg_scalars)});
+    size_t i = 0;
+    {emplace_arguments_str}
+    {emplace_kwarguments}
+    torch::lazy::TSOpVector {schema.aten_name}_out = torch::lazy::LowerTSBuiltin(function, op().op, arguments, kwarguments);
+    TORCH_CHECK_EQ({schema.aten_name}_out.size(), {len(schema.returns)});
+
+    return {schema.aten_name}_out;
+"""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/native_functions.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/native_functions.py
new file mode 100644
index 0000000000000000000000000000000000000000..05e252d09f9c16888dec66045a92b8aefa19b667
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/native_functions.py
@@ -0,0 +1,84 @@
+from __future__ import annotations
+
+import torchgen.api.meta as meta
+import torchgen.api.structured as structured
+from torchgen.api.types import kernel_signature
+from torchgen.context import with_native_function_and_index
+from torchgen.model import BackendIndex, NativeFunction, NativeFunctionsGroup
+from torchgen.utils import mapMaybe
+
+
+def torch_api_key_word_prefix(bankend_index: BackendIndex) -> str:
+    if bankend_index.external:
+        return ""
+
+    # Although Intel GPU ATen library is out-of-tree, it still utilizes torchgen to produce structured
+    # kernels. Regarding these produced structured kernels, they should be visible for the Intel GPU ATen
+    # library. Therefore, we need to add "TORCH_XPU_API" prefix to these structured kernels,
+    # rather than "TORCH_API". Because the semantic of "TORCH_API" is "hidden" for out-of-tree backends.
+    # For other in-tree backends like cpu and cuda, they still use "TORCH_API" prefix with "visible" semantic.
+    device_torch_api_key_word_mapping = {
+        "XPU": "TORCH_XPU_API",
+    }
+
+    return (
+        device_torch_api_key_word_mapping.get(
+            bankend_index.dispatch_key.name, "TORCH_API"
+        )
+        + " "
+    )
+
+
+@with_native_function_and_index
+def gen_unstructured(f: NativeFunction, backend_index: BackendIndex) -> str | None:
+    sig = kernel_signature(f, backend_index)
+    metadata = backend_index.get_kernel(f)
+    if metadata is None:
+        return None
+    if "legacy::" in metadata.kernel:
+        return None
+    else:
+        prefix = "static" if backend_index.external else "TORCH_API"
+        return f"{prefix} {sig.decl(name=metadata.kernel)};"
+
+
+@with_native_function_and_index
+def gen_structured(g: NativeFunctionsGroup, backend_index: BackendIndex) -> list[str]:
+    meta_name = meta.name(g)
+    out_args = structured.impl_arguments(g)
+    metadata = backend_index.get_kernel(g)
+    if metadata is None:
+        return []
+    prefix = torch_api_key_word_prefix(backend_index)
+    return [
+        f"""\
+struct {prefix}structured_{metadata.kernel} : public at::meta::structured_{meta_name} {{
+void impl({", ".join(a.decl() for a in out_args)});
+}};
+"""
+    ]
+
+
+# Generates NativeFunctions.h, a list of forward declarations of all
+# actual kernel definitions we keep in aten/src/ATen/native/
+@with_native_function_and_index
+def compute_native_function_declaration(
+    g: NativeFunctionsGroup | NativeFunction, backend_index: BackendIndex
+) -> list[str]:
+    metadata = backend_index.get_kernel(g)
+    if isinstance(g, NativeFunctionsGroup):
+        if metadata is not None and metadata.structured:
+            if backend_index.external:
+                # Structured hasn't been tested with external backends yet.
+                raise AssertionError(
+                    "Structured external backend functions are not implemented yet."
+                )
+            else:
+                return gen_structured(g, backend_index)
+        else:
+            return list(
+                mapMaybe(lambda f: gen_unstructured(f, backend_index), g.functions())
+            )
+    else:
+        x = gen_unstructured(g, backend_index)
+        return [] if x is None else [x]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/register_dispatch_key.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/register_dispatch_key.py
new file mode 100644
index 0000000000000000000000000000000000000000..52bb9602a73f050301e7f4953364d242e2722e54
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/register_dispatch_key.py
@@ -0,0 +1,1016 @@
+from __future__ import annotations
+
+import itertools
+import textwrap
+from dataclasses import dataclass
+from typing import Literal, TYPE_CHECKING
+from typing_extensions import assert_never
+
+import torchgen.api.cpp as cpp
+import torchgen.api.meta as meta
+import torchgen.api.structured as structured
+from torchgen.api.translate import translate
+from torchgen.api.types import (
+    BaseCType,
+    Binding,
+    ConstRefCType,
+    CppSignature,
+    CppSignatureGroup,
+    DispatcherSignature,
+    Expr,
+    kernel_signature,
+    MutRefCType,
+    NamedCType,
+    NativeSignature,
+    tensorT,
+)
+from torchgen.context import method_with_native_function, native_function_manager
+from torchgen.model import (
+    Argument,
+    BackendIndex,
+    DeviceCheckType,
+    DispatchKey,
+    gets_generated_out_inplace_wrapper,
+    is_cuda_dispatch_key,
+    NativeFunction,
+    NativeFunctionsGroup,
+    SchemaKind,
+    TensorOptionsArguments,
+)
+from torchgen.utils import mapMaybe, Target
+
+
+if TYPE_CHECKING:
+    from torchgen.selective_build.selector import SelectiveBuilder
+
+
+def gen_registration_headers(
+    backend_index: BackendIndex,
+    per_operator_headers: bool,
+    rocm: bool,
+) -> list[str]:
+    if per_operator_headers:
+        headers = ["#include <ATen/ops/as_strided_native.h>"]
+    else:
+        headers = ["#include <ATen/NativeFunctions.h>"]
+
+    if backend_index.dispatch_key in (DispatchKey.CPU, DispatchKey.Meta):
+        headers.append("#include <ATen/EmptyTensor.h>")
+    elif backend_index.dispatch_key == DispatchKey.CUDA:
+        if rocm:
+            headers.append("#include <ATen/hip/EmptyTensor.h>")
+        else:
+            headers.append("#include <ATen/cuda/EmptyTensor.h>")
+    elif backend_index.dispatch_key == DispatchKey.MPS:
+        headers.append("#include <ATen/mps/EmptyTensor.h>")
+    elif backend_index.dispatch_key == DispatchKey.XPU:
+        # XPU specific, this header resides in third_party/torch-xpu-ops
+        headers.append("#include <ATen/xpu/EmptyTensor.h>")
+    elif backend_index.dispatch_key == DispatchKey.MTIA:
+        headers.append("#include <ATen/native/mtia/EmptyTensor.h>")
+    elif per_operator_headers:
+        headers += [
+            "#include <ATen/ops/empty.h>",
+            "#include <ATen/ops/empty_strided.h>",
+            "#include <ATen/ops/_copy_from_and_resize.h>",
+            "#include <ATen/ops/_copy_from.h>",
+        ]
+    else:
+        headers.append("#include <ATen/Functions.h>")
+
+    headers.append("#include <c10/macros/Macros.h>")
+    return headers
+
+
+def gen_empty_impl_names(
+    backend_index: BackendIndex,
+) -> tuple[str | None, str | None]:
+    empty_impl = None
+    empty_strided_impl = None
+
+    if backend_index.dispatch_key in (
+        DispatchKey.Meta,
+        DispatchKey.CPU,
+        DispatchKey.CUDA,
+        DispatchKey.MPS,
+        DispatchKey.XPU,
+        DispatchKey.MTIA,
+    ):
+        dispatch = str(backend_index.dispatch_key).lower()
+        empty_impl = f"at::detail::empty_{dispatch}"
+        empty_strided_impl = f"at::detail::empty_strided_{dispatch}"
+    elif backend_index.dispatch_key in (
+        DispatchKey.CompositeExplicitAutogradNonFunctional,
+        DispatchKey.QuantizedCPU,
+        DispatchKey.QuantizedCUDA,
+        DispatchKey.XPU,
+    ):
+        empty_impl = "at::empty"
+        empty_strided_impl = "at::empty_strided"
+
+    return empty_impl, empty_strided_impl
+
+
+def gen_create_out_helper(backend_index: BackendIndex) -> list[str]:
+    if backend_index.dispatch_key == DispatchKey.Meta:
+        empty_options = "options.device(at::kMeta)"
+    else:
+        empty_options = "options"
+
+    empty_impl, empty_strided_impl = gen_empty_impl_names(backend_index)
+    if empty_impl is None:
+        return []
+
+    return [
+        f"""
+Tensor create_out(IntArrayRef sizes, IntArrayRef strides, const TensorOptions &options) {{
+  if (strides.empty()) {{
+      return {empty_impl}(sizes, {empty_options});
+  }} else {{
+      return {empty_strided_impl}(sizes, strides, {empty_options});
+  }}
+}}
+"""
+    ]
+
+
+def gen_maybe_create_proxy_helper(backend_index: BackendIndex) -> list[str]:
+    _, empty_strided_impl = gen_empty_impl_names(backend_index)
+    return (
+        []
+        if empty_strided_impl is None
+        else [
+            f"""
+std::optional<Tensor> maybe_create_proxy(const Tensor &out, IntArrayRef sizes, IntArrayRef strides, const TensorOptions &options) {{
+  if (out.strides() != strides) {{
+    return {empty_strided_impl}(sizes, strides, options);
+  }}
+  return std::nullopt;
+}}
+"""
+        ]
+    )
+
+
+def gen_resize_out_helper(backend_index: BackendIndex) -> list[str]:
+    if backend_index.dispatch_key == DispatchKey.CompositeExplicitAutogradNonFunctional:
+        # The function isn't used by this key (since only functional ops have a kernel for this key),
+        # so we need to not include it to avoid a defined-but-not-used error.
+        return []
+    return [
+        """
+void resize_out(const Tensor &out, IntArrayRef sizes, IntArrayRef strides, const TensorOptions &options) {
+  TORCH_CHECK(options.dtype() == out.dtype(),
+      "Expected out tensor to have dtype ", options.dtype(), ", but got ", out.dtype(), " instead");
+  TORCH_CHECK(options.device() == out.device(),
+      "Expected out tensor to have device ", options.device(), ", but got ", out.device(), " instead");
+  const bool resized = at::native::resize_output(out, sizes);
+  // Only restride if a resize occurred; otherwise we ignore the (advisory)
+  // strides from the meta function and directly use the output tensor's
+  // preexisting strides
+  if (resized) {
+    if (!strides.empty()) {
+      TORCH_INTERNAL_ASSERT(!options.memory_format_opt().has_value());
+      // TODO: avoid the redispatch here
+      out.as_strided_(sizes, strides);
+    } else if (options.memory_format_opt().has_value()) {
+      out.unsafeGetTensorImpl()->empty_tensor_restride(*options.memory_format_opt());
+    }
+  }
+}
+"""
+    ]
+
+
+def gen_check_inplace_helper(backend_index: BackendIndex) -> list[str]:
+    return [
+        """
+void check_inplace(const Tensor &self, IntArrayRef sizes, const TensorOptions &options) {
+  // These checks are needed on those operators that:
+  //   1) don't use 'TensorIterator' (e.g. 'addmm' and 'baddbmm')
+  //   2) have particular typing rules (e.g. 'cumsum' and 'cumprod')
+  // For other operators (e.g. 'add'), 'TensorIterator' already checks
+  // these things separately.
+  TORCH_CHECK(options.dtype() == self.dtype(),
+      "Bad in-place call: ",
+      "input tensor dtype ", self.dtype(), " and output tensor dtype ", options.dtype(), " should match");
+  TORCH_CHECK(options.device() == self.device(),
+      "Bad in-place call: ",
+      "input tensor device ", self.device(), " and output tensor device ", options.device(), " should match");
+  TORCH_CHECK(sizes == self.sizes(),
+      "Bad in-place call: ",
+      "input tensor size ", self.sizes(), " and output tensor size ", sizes, " should match");
+}
+"""
+    ]
+
+
+def gen_registration_helpers(backend_index: BackendIndex) -> list[str]:
+    return [
+        'C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED("-Wunused-function")',
+        *gen_create_out_helper(backend_index),
+        *gen_resize_out_helper(backend_index),
+        *gen_check_inplace_helper(backend_index),
+        *gen_maybe_create_proxy_helper(backend_index),
+        "C10_DIAGNOSTIC_POP()",
+    ]
+
+
+# Generates Register{dispatch}.cpp (e.g., RegisterCPU.cpp).
+#
+#   - The primary function of this file is to register all of the
+#     implementations for the given dispatch key to the dispatcher,
+#     so they are available for use in PyTorch.  If dispatch is
+#     None, we generate schema (def) registrations and catchall
+#     registrations.
+#   - The secondary function of this file is to generate a wrapper
+#     around functions.  In CPUType these wrappers do nothing
+#     (and should be removed), but in other cases they handle
+#     DeviceGuard. A small extra benefit of wrappers is they
+#     are not overloaded, so they can be used in the registration
+#     API without having to disambiguate which overload you want
+#     (as would be the case if you directly registered native::
+#     functions).
+#   - The tertiary function of this file is to generate *static*
+#     cpp API bindings which can be used to bypass dispatcher
+#     directly to kernels, but with user-friendly cpp-style API
+@dataclass(frozen=True)
+class RegisterDispatchKey:
+    backend_index: BackendIndex
+
+    target: Literal[
+        Target.ANONYMOUS_DEFINITION,
+        Target.NAMESPACED_DEFINITION,
+        Target.NAMESPACED_DECLARATION,
+        Target.REGISTRATION,
+    ]
+
+    # Selector object to determine which operators to generate
+    # registration code for.
+    selector: SelectiveBuilder
+
+    # Whether or not we are actually code-genning for ROCm
+    rocm: bool
+
+    # Whether or not to generate symint registrations or not.  External users
+    # of codegen who don't care about symints can set this to false to get
+    # non-SymInt codegen
+    symint: bool
+
+    # The class that all unstructured native functions live under. This is used to improve
+    # compiler error messages when a kernel writer adds a native function with the wrong signature.
+    # This is only used in unstructured kernels, since structured kernels already live in a class.
+    # Finally, this field is currently Optional because it is only used by external backends.
+    # It would be nice if we can add the same logic to in-tree kernels too, but that requires updating
+    # all of the existing kernel signatures scattered across aten/src/ATen/native.
+    class_method_name: str | None
+
+    # Only set to true in lightweight dispatch. If lightweight dispatch is enabled we are registering
+    # operators into JIT op registry, thus we need to avoid generating code to register into the dispatcher.
+    skip_dispatcher_op_registration: bool
+
+    @staticmethod
+    def gen_device_check(
+        type: DeviceCheckType, args: list[Argument], method_name: str
+    ) -> str:
+        if type == DeviceCheckType.NoCheck:
+            return "  // No device check\n"
+
+        device_check = "std::optional<Device> common_device = std::nullopt;\n"
+        device_check += "(void)common_device; // Suppress unused variable warning\n"
+        for arg in args:
+            # Only tensor like arguments are eligible
+            if arg.type.is_tensor_like():
+                device_check += f"""
+  c10::impl::check_and_update_common_device(common_device, {arg.name}, "{method_name}", "{arg.name}");"""
+        return device_check
+
+    @method_with_native_function
+    def __call__(self, f: NativeFunctionsGroup | NativeFunction) -> list[str]:
+        if isinstance(f, NativeFunctionsGroup):
+            g: NativeFunctionsGroup = f
+            # Note: We call gen_structured() if the operator is marked structured, regardless of the backend.
+            # gen_structured() has special logic to handle auto-generated kernels.
+            if g.structured:
+                return self.gen_structured(g)
+            else:
+                return list(
+                    mapMaybe(lambda f: self.gen_unstructured(f, g), g.functions())
+                )
+        elif isinstance(f, NativeFunction):
+            r = self.gen_unstructured(f)
+            return [] if r is None else [r]
+        else:
+            assert_never(f)
+
+    def wrapper_kernel_sig(
+        self, f: NativeFunction
+    ) -> NativeSignature | DispatcherSignature:
+        # The prefix is just to ensure uniqueness. The Dispatcher API doesn't guarantee unique kernel names.
+        return DispatcherSignature.from_schema(
+            f.func,
+            prefix=f"wrapper_{self.backend_index.dispatch_key}_{f.func.name.overload_name}_",
+            symint=self.symint,
+        )
+
+    def gen_out_inplace_wrapper(
+        self, f: NativeFunction, g: NativeFunctionsGroup | None
+    ) -> str | None:
+        if g is None:
+            return None
+        k = f.func.kind()
+        if k is SchemaKind.inplace:
+            copy_op = "at::_copy_from"
+        elif k is SchemaKind.out:
+            copy_op = "at::_copy_from_and_resize"
+        else:
+            raise AssertionError("gen_out_inplace_wrapper called on a functional op")
+
+        sig = self.wrapper_kernel_sig(f)
+        name = sig.name()
+
+        func_res = f"{name}_tmp"
+        return_names = cpp.return_names(f)
+        if len(return_names) > 1:
+            updates = "\n  ".join(
+                f"{copy_op}(std::get<{i}>({func_res}), {ret_name});"
+                for i, ret_name in enumerate(return_names)
+            )
+            returns = f"{sig.returns_type().cpp_type()}({', '.join(return_names)})"
+        elif len(return_names) == 1:
+            ret_name = return_names[0]
+            updates = f"{copy_op}({func_res}, {ret_name});"
+            returns = ret_name
+        else:
+            assert len(f.func.arguments.out) == 1
+            returns = ""
+            out_arg = f.func.arguments.out[0]
+            if out_arg.type.is_list_like():
+                updates = f"""\
+    for (int64_t i = 0; i < {func_res}.size(); ++i) {{
+        {copy_op}({func_res}[i], {out_arg.name}[i]);
+    }}"""
+            else:
+                updates = f"{copy_op}({func_res}, {out_arg.name});"
+
+        functional_sig = self.wrapper_kernel_sig(g.functional)
+        wrapper_name = sig.name()
+
+        return f"""\
+{sig.defn(name=wrapper_name)} {{
+  auto {func_res} = {functional_sig.name()}({", ".join(e.expr for e in translate(sig.arguments(), functional_sig.arguments()))});
+  {updates}
+  return {returns};
+}}
+"""
+
+    def gen_structured(self, g: NativeFunctionsGroup) -> list[str]:
+        metadata = self.backend_index.get_kernel(g)
+        if self.backend_index.dispatch_key == DispatchKey.Meta:
+            assert not self.backend_index.has_kernel(g.out), (
+                "Do not explicitly specify Meta dispatch key on structured "
+                "functions, they will be automatically generated for you"
+            )
+        elif (
+            self.backend_index.dispatch_key
+            == DispatchKey.CompositeExplicitAutogradNonFunctional
+        ):
+            assert not self.backend_index.has_kernel(g.out), (
+                "Do not explicitly specify CompositeExplicitAutograd dispatch key on structured "
+                "functions, they will be automatically generated for you"
+            )
+        elif metadata is None or not metadata.structured:
+            return list(mapMaybe(lambda f: self.gen_unstructured(f, g), g.functions()))
+        structured_gen = StructuredRegisterDispatchKey(
+            self.backend_index,
+            self.target,
+            self.selector,
+            self.rocm,
+            self.symint,
+            self.class_method_name,
+            self.skip_dispatcher_op_registration,
+            g,
+        )
+        return list(mapMaybe(structured_gen.gen_one, g.functions()))
+
+    def gen_unstructured(
+        self, f: NativeFunction, g: NativeFunctionsGroup | None = None
+    ) -> str | None:
+        with native_function_manager(f):
+            inplace_meta = False
+            gets_out_inplace_wrapper = False
+            if not self.backend_index.has_kernel(f):
+                if (
+                    self.backend_index.dispatch_key == DispatchKey.Meta
+                    and f.func.kind() is SchemaKind.inplace
+                    and
+                    # Defer to composites for meta implementation
+                    not f.has_composite_kernel
+                    and
+                    # Inplace list operations are not supported
+                    len(f.func.returns) == 1
+                ):
+                    inplace_meta = True
+                elif (
+                    not self.backend_index.use_out_as_primary
+                    and g is not None
+                    and gets_generated_out_inplace_wrapper(f, g, self.backend_index)
+                ):
+                    # We want to generate inplace/out wrappers, that don't have a kernel for the backend.
+                    gets_out_inplace_wrapper = True
+                else:
+                    return None
+            if f.manual_kernel_registration:
+                return None
+
+            if (
+                self.target is Target.REGISTRATION
+                and not self.selector.is_native_function_selected(f)
+            ):
+                return None
+
+            sig = self.wrapper_kernel_sig(f)
+
+            name = sig.name()
+            returns_type = sig.returns_type().cpp_type()
+            args = sig.arguments()
+            args_str = ", ".join(a.defn() for a in args)
+
+            # See Note [Direct dispatch bindings]
+            cpp_sig_group = CppSignatureGroup.from_native_function(
+                f, method=False, fallback_binding=False
+            )
+
+            # TODO: dedupe this with the structured codegen
+            if self.target is Target.NAMESPACED_DECLARATION:
+                result = ""
+                for cpp_sig in cpp_sig_group.signatures(symint=self.symint):
+                    result += f"TORCH_API {cpp_sig.decl()};\n"
+                return result
+            elif self.target is Target.NAMESPACED_DEFINITION:
+
+                def generate_defn(cpp_sig: CppSignature) -> str:
+                    return f"""
+{cpp_sig.defn()} {{
+return {sig.name()}({", ".join(e.expr for e in translate(cpp_sig.arguments(), sig.arguments()))});
+}}
+"""
+
+                result = ""
+                for cpp_sig in cpp_sig_group.signatures(symint=self.symint):
+                    result += generate_defn(cpp_sig)
+                return result
+
+            elif self.target is Target.ANONYMOUS_DEFINITION:
+                # short circuit for inplace_meta
+                if inplace_meta:
+                    assert f.func.arguments.self_arg is not None
+                    self_arg_name = f.func.arguments.self_arg.argument.name
+                    # TODO: handle in place on tensor list
+                    return f"""
+{returns_type} {name}({args_str}) {{
+  TORCH_CHECK_NOT_IMPLEMENTED({self_arg_name}.is_meta(),
+    "Cannot inplace into non-meta tensor with meta tensor argument");
+  return {self_arg_name};
+}}
+"""
+
+                # short circuit for generated inplace/out wrappers
+                if gets_out_inplace_wrapper:
+                    return self.gen_out_inplace_wrapper(f, g)
+
+                metadata = self.backend_index.get_kernel(f)
+                if metadata is None:
+                    return None
+                if self.class_method_name is None:
+                    impl_name = f"{metadata.cpp_namespace}::{metadata.kernel}"
+                else:
+                    impl_name = f"{metadata.cpp_namespace}::{self.class_method_name}::{metadata.kernel}"
+
+                kernel_sig = kernel_signature(f, self.backend_index)
+
+                args_exprs_str = ", ".join(
+                    e.expr
+                    for e in translate(
+                        sig.arguments(), kernel_sig.arguments(), method=False
+                    )
+                )
+
+                device_check = "  // No device check\n"
+                # Backends that require device guards presumably also require device checks.
+                if self.backend_index.device_guard:
+                    device_check_args = itertools.chain(
+                        f.func.arguments.out, f.func.arguments.flat_positional
+                    )
+                    device_check = RegisterDispatchKey.gen_device_check(
+                        f.device_check, list(device_check_args), name
+                    )
+
+                device_guard = "// DeviceGuard omitted"  # default
+                if f.device_guard and self.backend_index.device_guard:
+                    has_tensor_options = any(
+                        isinstance(a, TensorOptionsArguments)
+                        for a in f.func.arguments.non_out
+                    )
+                    if has_tensor_options:
+                        # kernel is creating a tensor
+                        device_guard = """
+  const DeviceGuard device_guard(device_or_default(device));"""
+
+                        # CUDA requires special handling
+                        if is_cuda_dispatch_key(self.backend_index.dispatch_key):
+                            device_guard = f"globalContext().lazyInitDevice(c10::DeviceType::CUDA);\n{device_guard}"
+                    else:
+                        # kernel is operating on existing tensors
+
+                        # There is precedence for which argument we use to do
+                        # device guard.  This describes the precedence order.
+                        self_arg = (
+                            [f.func.arguments.self_arg.argument]
+                            if f.func.arguments.self_arg is not None
+                            else []
+                        )
+                        candidate_args = itertools.chain(
+                            self_arg,
+                            f.func.arguments.out,
+                            f.func.arguments.flat_positional,
+                        )
+
+                        # Only tensor like arguments are eligible
+                        device_of = next(
+                            (
+                                f"{a.name}"
+                                for a in candidate_args
+                                if a.type.is_tensor_like()
+                            ),
+                            None,
+                        )
+                        if device_of is not None:
+                            device_guard = f"const OptionalDeviceGuard device_guard(device_of({device_of}));"
+
+                return f"""\
+namespace {{
+
+{returns_type} {name}({args_str}) {{
+  {device_check}
+
+  {device_guard}
+  return {impl_name}({args_exprs_str});
+}}
+
+}} // anonymous namespace
+"""
+
+            elif self.target is Target.REGISTRATION:
+                if f.manual_kernel_registration or self.skip_dispatcher_op_registration:
+                    return None
+                else:
+                    payload = f"TORCH_FN({name})"
+                    return f'm.impl("{f.func.name}",\n{payload});\n'
+            else:
+                assert_never(self.target)
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                           STRUCTURED
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+@dataclass(frozen=True)
+class StructuredRegisterDispatchKey(RegisterDispatchKey):
+    g: NativeFunctionsGroup
+
+    def gen_class_set_output_functions(
+        self, k: SchemaKind, parent_class: str, generate_super: bool
+    ) -> str:
+        if generate_super:
+            set_output_super = f"{parent_class}::set_output_raw_strided(output_idx, sizes, strides, options, names);"
+        else:
+            set_output_super = ""
+
+        def gen_set_output_function(name: str, maybe_create_proxy: bool) -> str:
+            return f"""
+void set_output_{name}(
+    int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
+    TensorOptions options, DimnameList names
+) override {{
+{textwrap.indent(self.gen_class_set_output_body(k, maybe_create_proxy), "    ")}
+    if (!names.empty()) {{
+      namedinference::propagate_names(outputs_[output_idx], names);
+    }}
+    // super must happen after, so that downstream can use maybe_get_output
+    // to retrieve the output
+{textwrap.indent(set_output_super, "    ")}
+}}
+"""
+
+        return f"""
+{gen_set_output_function("strided", maybe_create_proxy=True)}
+{gen_set_output_function("raw_strided", maybe_create_proxy=False)}
+"""
+
+    def gen_class_set_output_body(self, k: SchemaKind, maybe_create_proxy: bool) -> str:
+        if self.backend_index.dispatch_key in [
+            DispatchKey.CUDA,
+            DispatchKey.MPS,
+            DispatchKey.XPU,
+            DispatchKey.CompositeExplicitAutogradNonFunctional,
+        ]:
+            maybe_set_guard = """
+auto current_device = guard_.current_device();
+if (C10_UNLIKELY(current_device.has_value())) {
+  TORCH_INTERNAL_ASSERT(*current_device == options.device(),
+    "structured kernels don't support multi-device outputs");
+} else {
+  guard_.reset_device(options.device());
+}
+"""
+            maybe_set_guard_line = maybe_set_guard + "\n"
+        else:
+            maybe_set_guard_line = maybe_set_guard = ""
+
+        if maybe_create_proxy:
+            create_proxy = """
+auto maybe_proxy = maybe_create_proxy(out, sizes, strides, options);
+if (C10_UNLIKELY(maybe_proxy.has_value())) {
+    proxy_outputs_[output_idx] = std::move(maybe_proxy).value();
+}
+"""
+        else:
+            create_proxy = ""
+
+        if k is SchemaKind.functional:
+            assert self.backend_index.dispatch_key in (
+                DispatchKey.Meta,
+                DispatchKey.CPU,
+                DispatchKey.CUDA,
+                DispatchKey.MPS,
+                DispatchKey.XPU,
+                DispatchKey.MTIA,
+                DispatchKey.CompositeExplicitAutogradNonFunctional,
+            )
+            return f"""{maybe_set_guard_line}
+outputs_[output_idx] = create_out(sizes, strides, options);"""
+        elif k is SchemaKind.inplace:
+            return f"""{maybe_set_guard_line}
+const auto& out = outputs_[output_idx].get();
+check_inplace(out, sizes, options);
+{create_proxy}"""
+        elif k is SchemaKind.out:
+            return f"""{maybe_set_guard_line}
+const auto& out = outputs_[output_idx].get();
+resize_out(out, sizes, strides, options);
+{create_proxy}"""
+        elif k is SchemaKind.mutable or k is SchemaKind.scratch:
+            raise AssertionError(
+                f"{k} structured operators are currently not supported"
+            )
+        else:
+            assert_never(k)
+
+    # returns the definition of a ctor, as well as how to construct
+    # this class to a variable named op
+    def gen_class_ctor(self, k: SchemaKind, class_name: str, returns: int) -> str:
+        if k is SchemaKind.functional:
+            return ""
+        elif k is SchemaKind.inplace:
+            # TODO: Make sure out argument is guaranteed to be self
+            return f"{class_name}(Tensor& self) : outputs_{{std::ref(self)}} {{}}"
+        elif k is SchemaKind.out:
+            out_args = ", ".join(f"Tensor& out{i}" for i in range(returns))
+            out_refs = ", ".join(f"std::ref(out{i})" for i in range(returns))
+            return f"{class_name}({out_args}) : outputs_{{ {out_refs} }} {{}}"
+        elif k is SchemaKind.mutable or k is SchemaKind.scratch:
+            raise AssertionError(
+                f"{k} structured operators are currently not supported"
+            )
+        else:
+            assert_never(k)
+
+    def gen_class(
+        self,
+        f: NativeFunction,
+        k: SchemaKind,
+        *,
+        class_name: str,
+        parent_class: str,
+        generate_super: bool,
+    ) -> str:
+        if k is SchemaKind.functional:
+            output_type = "Tensor"
+            output_value = "outputs_[output_idx]"
+            proxy_field = ""
+        elif k is SchemaKind.inplace:
+            output_type = "std::reference_wrapper<Tensor>"
+            output_value = "proxy_outputs_[output_idx].has_value() ? *proxy_outputs_[output_idx] : outputs_[output_idx].get()"
+            proxy_field = f"std::array<::std::optional<Tensor>, {len(f.func.returns)}> proxy_outputs_;"
+        elif k is SchemaKind.out:
+            output_type = "std::reference_wrapper<Tensor>"
+            output_value = "proxy_outputs_[output_idx].has_value() ? *proxy_outputs_[output_idx] : outputs_[output_idx].get()"
+            proxy_field = f"std::array<::std::optional<Tensor>, {len(f.func.returns)}> proxy_outputs_;"
+        else:
+            raise RuntimeError(f"Unsupported SchemaKind {k}")
+
+        if self.backend_index.dispatch_key == DispatchKey.CUDA:
+            if self.rocm:
+                guard_field = "c10::hip::OptionalHIPGuardMasqueradingAsCUDA guard_;"
+            else:
+                guard_field = "c10::cuda::OptionalCUDAGuard guard_;"
+        elif (
+            self.backend_index.dispatch_key
+            == DispatchKey.CompositeExplicitAutogradNonFunctional
+        ):
+            guard_field = "c10::OptionalDeviceGuard guard_;"
+        elif self.backend_index.dispatch_key == DispatchKey.MPS:
+            # TODO: Move to OptionalMPSGuard.
+            guard_field = "c10::OptionalDeviceGuard guard_;"
+        elif self.backend_index.dispatch_key == DispatchKey.XPU:
+            guard_field = "c10::OptionalDeviceGuard guard_;"
+        elif self.backend_index.dispatch_key == DispatchKey.MTIA:
+            guard_field = "c10::OptionalDeviceGuard guard_;"
+        else:
+            guard_field = ""
+
+        indent = " " * 4
+        class_ctor_str = self.gen_class_ctor(k, class_name, len(f.func.returns))
+        lines = (
+            f"struct {class_name} final : public {parent_class} {{",
+            f"{textwrap.indent(class_ctor_str, indent)}",
+            f"{textwrap.indent(self.gen_class_set_output_functions(k, parent_class, generate_super), indent)}",
+            "    const Tensor& maybe_get_output(int64_t output_idx) override {",
+            f"      return {output_value};\n",  # type: ignore[possibly-undefined]  # TODO: audit
+            "    }",
+            # type: ignore[possibly-undefined]  # TODO: audit
+            f"    std::array<{output_type}, {len(f.func.returns)}> outputs_;",
+            f"{textwrap.indent(proxy_field, indent)}",  # type: ignore[possibly-undefined]  # TODO: audit
+            f"{textwrap.indent(guard_field, indent)}",
+            "};",
+        )
+        return "\n".join(line for line in lines if line)
+
+    @method_with_native_function
+    def gen_one(self, f: NativeFunction) -> str | None:
+        assert not f.manual_kernel_registration
+
+        if (
+            self.target is Target.REGISTRATION
+            and not self.selector.is_native_function_selected(f)
+        ):
+            return None
+
+        # TODO: Now, there is something interesting going on here.  In the code below,
+        # we generate CompositeExplicitAutogradNonFunctional implementations of functional and inplace
+        # based on the out implementation.  But in fact, out is definable by
+        # functional too (just not very efficiently), and this is honestly the
+        # MORE likely situation for a backend implementer.  How do we pick?
+        # Well, taking a page from Haskell type classes and default methods,
+        # we could conceivably register a circular definition (out in terms
+        # of functional, and functional in terms of out) and just require
+        # someone to implement one or the other.  We'd have to do a little bit
+        # of work to not register one of these "weak" definitions unless there
+        # is a strong definition somewhere in the DAG!  So it's not implemented yet.
+        if (
+            self.backend_index.dispatch_key
+            == DispatchKey.CompositeExplicitAutogradNonFunctional
+            and f.func.kind() is SchemaKind.out
+        ):
+            # Never generate a default implementation for out, that's what you
+            # have to define as a backend implementer
+            return None
+
+        # Note [Direct dispatch bindings]
+        # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+        # Signature of the non-dispatched function we'll expose in a header
+        # (e.g., at::cpu::add).  We don't generate methods (TODO: do this
+        # when CPUTensor class is a thing); nor do we generate fallback
+        # bindings for manual_cpp_binding functions.
+        cpp_sig_group = CppSignatureGroup.from_native_function(
+            f, method=False, fallback_binding=False
+        )
+
+        # Signature of the wrapper function we'll register to the dispatcher
+        kern = self.backend_index.get_kernel(f)
+        sig = NativeSignature(
+            f.func,
+            prefix=f"wrapper_{self.backend_index.dispatch_key}_",
+            symint=kern is not None and kern.supports_symint(),
+        )
+
+        if self.target is Target.NAMESPACED_DECLARATION:
+            result = ""
+            for cpp_sig in cpp_sig_group.signatures(symint=self.symint):
+                result += f"TORCH_API {cpp_sig.decl()};\n"
+            return result
+
+        elif self.target is Target.NAMESPACED_DEFINITION:
+
+            def generate_defn(cpp_sig: CppSignature) -> str:
+                return f"""
+{cpp_sig.defn()} {{
+return {sig.name()}({", ".join(e.expr for e in translate(cpp_sig.arguments(), sig.arguments()))});
+}}
+"""
+
+            result = ""
+            for cpp_sig in cpp_sig_group.signatures(symint=self.symint):
+                result += generate_defn(cpp_sig)
+            return result
+
+        elif self.target is Target.ANONYMOUS_DEFINITION:
+            k = f.func.kind()
+
+            # Construct the body of the wrapper function with signature sig
+            sig_body = []
+            # We'll use context to keep track of any variables we've brought
+            # into scope while generating code
+            context: list[Binding | Expr] = list(sig.arguments())
+
+            # Initialize the class corresponding to this structured
+            # operator; feeding it the output argument(s) if it is known
+            if self.backend_index.dispatch_key is DispatchKey.Meta:
+                class_name = f"structured_{meta.name(self.g)}_meta_{k.name}"
+                parent_class = f"at::meta::structured_{meta.name(self.g)}"
+            elif (
+                self.backend_index.dispatch_key
+                is DispatchKey.CompositeExplicitAutogradNonFunctional
+            ):
+                # TODO: dedup this branch
+                class_name = f"structured_{meta.name(self.g)}_default_backend_{k.name}"
+                parent_class = f"at::meta::structured_{meta.name(self.g)}"
+            else:
+                metadata = self.backend_index.get_kernel(self.g)
+                assert metadata is not None
+                class_name = f"structured_{metadata.kernel}_{k.name}"
+                parent_class = f"{metadata.cpp_namespace}::structured_{metadata.kernel}"
+
+            if self.backend_index.device_guard:
+                device_check_args = itertools.chain(
+                    f.func.arguments.out, f.func.arguments.flat_positional
+                )
+                sig_body.append(
+                    RegisterDispatchKey.gen_device_check(
+                        f.device_check, list(device_check_args), sig.name()
+                    )
+                )
+
+            if k is SchemaKind.functional:
+                sig_body.append(f"{class_name} op;")
+            elif k is SchemaKind.inplace:
+                sig_body.append(f"{class_name} op(self);")
+            elif k is SchemaKind.out:
+                out_args_str = ", ".join(a.name for a in f.func.arguments.out)
+                sig_body.append(f"{class_name} op({out_args_str});")
+
+            # Translate the input native arguments into structured
+            # arguments for the meta call
+            meta_exprs = ", ".join(
+                e.expr
+                for e in translate(
+                    context, structured.meta_arguments(self.g), method=False
+                )
+            )
+
+            if self.g.out.precomputed:
+                # If this function group has precomputed elements, the meta function
+                # returns a struct containing them which must be saved so that it
+                # can be unpacked when generating code to call the impl.
+                sig_body.append(f"auto precompute = op.meta({meta_exprs});")
+
+                # Put all of the contents of the precompute struct into the context
+                # so that translate will be able to return the correct args for the
+                # call to the impl.
+                precomputed_values = [
+                    *self.g.out.precomputed.replace.values(),
+                    self.g.out.precomputed.add,
+                ]
+                for precomputed_elems in precomputed_values:
+                    context.extend(
+                        Expr(
+                            expr=f"precompute.{arg.name}",
+                            type=structured.argument_type(arg, binds=arg.name),
+                        )
+                        for arg in precomputed_elems
+                    )
+
+                # Add a use of the precompute struct so FB internal compilers don't
+                # complain that there is an unused variable.
+                sig_body.append("(void)precompute;")
+            else:
+                sig_body.append(f"op.meta({meta_exprs});")
+
+            # After running meta, op.outputs_ is guaranteed to be valid;
+            # add it to the context
+            out_args = structured.out_arguments(self.g)
+            for i, out_arg in enumerate(out_args):
+                assert ConstRefCType(BaseCType(tensorT)) == out_arg.nctype.type
+
+                if k is SchemaKind.out:
+                    expr = f"op.maybe_get_output({i})"
+                else:
+                    expr = f"op.outputs_[{i}]"
+
+                context.append(
+                    Expr(
+                        expr=expr,
+                        # TODO: Stop hardcoding that the output type is a Tensor.  Note
+                        # that for the codegen here this is fine because outputs_ is
+                        # hardcoded to be tensor already
+                        type=NamedCType(
+                            out_arg.nctype.name, MutRefCType(BaseCType(tensorT))
+                        ),
+                    )
+                )
+
+            # With the expanded context, do the impl call (if not a meta
+            # function)
+            if (
+                self.backend_index.dispatch_key
+                == DispatchKey.CompositeExplicitAutogradNonFunctional
+            ):
+                # TODO: https://github.com/pytorch/pytorch/issues/53023
+                out_sig_group = CppSignatureGroup.from_native_function(
+                    self.g.out, method=False, fallback_binding=f.manual_cpp_binding
+                )
+                out_sig = out_sig_group.most_faithful_signature()
+                api_name = out_sig.name()
+                out_exprs = ", ".join(
+                    e.expr
+                    for e in translate(context, out_sig.arguments(), method=False)
+                )
+                # TODO: I think this means structured won't work with method
+                # only functions (but maybe you're saved by faithful? iunno.)
+                # NB: Originally I wrote this as an at::redispatch call, but
+                # I got in trouble because that meant I needed a DispatchKeySet
+                # in the wrapper function, which meant I needed a DispatchKeySet
+                # in the DispatchKeyFunctions declarations, but the defined API
+                # there does NOT permit a dispatch key set.  I think you can
+                # probably unwind this by calling some function to do the TLS
+                # fetch and get the DispatchKeySet when you don't have it, but
+                # I didn't do it for this version
+                sig_body.append(f"at::{api_name}({out_exprs});")
+            elif self.backend_index.dispatch_key != DispatchKey.Meta:
+                impl_exprs = ", ".join(
+                    e.expr
+                    for e in translate(
+                        context, structured.impl_arguments(self.g), method=False
+                    )
+                )
+                sig_body.append(f"op.impl({impl_exprs});")
+
+            # Go over each output, and check if there is a proxy created for it.
+            # If so, copy it over to the original output.
+            if k is SchemaKind.out or k is SchemaKind.inplace:
+                for i in range(len(f.func.returns)):
+                    sig_body.append(
+                        f"if (op.proxy_outputs_[{i}].has_value()) op.outputs_[{i}].get().copy_(*op.proxy_outputs_[{i}]);"
+                    )
+
+            # Destructively return the final tensors
+            # TODO: Do this in translate instead
+            if k is SchemaKind.functional:
+                if len(f.func.returns) == 1:
+                    ret_expr = "std::move(op.outputs_[0])"  # small optimization
+                else:
+                    moved = ", ".join(
+                        f"std::move(op.outputs_[{i}])"
+                        for i in range(len(f.func.returns))
+                    )
+                    ret_expr = f"std::make_tuple({moved})"
+            elif k is SchemaKind.inplace:
+                ret_expr = "self"
+            elif k is SchemaKind.out:
+                if len(f.func.returns) == 1:
+                    ret_expr = f.func.arguments.out[0].name
+                else:
+                    refs = ", ".join(a.name for a in f.func.arguments.out)
+                    ret_expr = f"std::forward_as_tuple({refs})"
+            sig_body.append(f"return {ret_expr};")  # type: ignore[possibly-undefined]  # TODO: audit
+
+            sig_body_str = "\n".join(sig_body)
+
+            # For an overview of what this template code looks like, see
+            # https://github.com/pytorch/rfcs/pull/9
+            return f"""\
+{
+                self.gen_class(
+                    f,
+                    k,
+                    class_name=class_name,
+                    parent_class=parent_class,
+                    generate_super=self.g.out.structured_inherits is not None,
+                )
+            }
+
+{sig.defn()} {{
+{sig_body_str}
+}}
+"""
+
+        elif self.target is Target.REGISTRATION:
+            return f'm.impl("{f.func.name}", TORCH_FN({sig.name()}));'
+        else:
+            assert_never(self.target)
+            # Silence mypy's "Missing return statement" error
+            return None
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/ufunc.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/ufunc.py
new file mode 100644
index 0000000000000000000000000000000000000000..045d8de110e7442d0732aee483f0aab7015140d7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/dest/ufunc.py
@@ -0,0 +1,553 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import TYPE_CHECKING
+
+import torchgen.api.ufunc as ufunc
+from torchgen.api.translate import translate
+from torchgen.api.types import (
+    BaseCType,
+    Binding,
+    CType,
+    Expr,
+    NamedCType,
+    opmath_t,
+    scalar_t,
+    StructuredImplSignature,
+    VectorizedCType,
+)
+from torchgen.context import with_native_function
+from torchgen.model import (
+    Argument,
+    BaseTy,
+    BaseType,
+    DispatchKey,
+    NativeFunctionsGroup,
+    ScalarType,
+    UfuncKey,
+)
+from torchgen.utils import OrderedSet
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+    from torchgen.api.ufunc import UfunctorBindings
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                                  CUDA STUFF
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+# NB: not bothering to generate dispatch stub forward declaration in header,
+# we can just paste it wherever necessary
+
+# TODO: use BackendIndex
+# dispatch_key: DispatchKey  # only CPU/CUDA right now
+
+
+# Represents functors for implementing CUDA ufuncs.
+# Functors are templated by scalar_t because when USERS instantiate functors
+# they are templated.  A functor looks something like this:
+#
+#   template <typename scalar_t>
+#   struct CUDAFunctorOnSelf_add {
+#     using opmath_t = at::opmath_type<scalar_t>;
+#     opmath_t other_;
+#     opmath_t alpha_;
+#     CUDAFunctorOnSelf_add(opmath_t other, opmath_t alpha)
+#         : other_(other), alpha_(alpha) {}
+#     __device__ scalar_t operator()(scalar_t self) {
+#       return ufunc::add(static_cast<opmath_t>(self), other_, alpha_);
+#     }
+#   };
+#
+@dataclass(frozen=True)
+class UfunctorSignature:
+    g: NativeFunctionsGroup
+    scalar_tensor_idx: int | None
+    name: str
+
+    def arguments(self) -> UfunctorBindings:
+        return ufunc.ufunctor_arguments(
+            self.g, scalar_tensor_idx=self.scalar_tensor_idx, scalar_t=scalar_t
+        )
+
+    def fields(self) -> list[Binding]:
+        # fields are renamed to have a trailing underscore, as is conventional
+        return [b.rename(f"{b.name}_") for b in self.arguments().ctor]
+
+    def returns_type(self) -> CType:
+        # TODO: don't hardcode; return type will be inferred based on tags on
+        # the native function
+        return BaseCType(scalar_t)
+
+    def decl_fields(self) -> str:
+        return "\n".join(f"{f.type} {f.name};" for f in self.fields())
+
+    def inline_defn_ctor(self) -> str:
+        args_str = ", ".join(a.decl() for a in self.arguments().ctor)
+        # NB: hypothetically could do this with translate but the
+        # transition here is very regular
+        init_str = ", ".join(f"{a.name}_({a.name})" for a in self.arguments().ctor)
+        return f"{self.name}({args_str}) : {init_str} {{}}"
+
+    def decl_apply(self) -> str:
+        args_str = ", ".join(a.decl() for a in self.arguments().apply)
+        return f"{self.returns_type().cpp_type()} operator()({args_str}) const"
+
+
+@dataclass(frozen=True)
+class UfuncSignature:
+    g: NativeFunctionsGroup
+    name: str
+    compute_t: CType
+
+    def arguments(self) -> list[Binding]:
+        return ufunc.ufunc_arguments(self.g, compute_t=self.compute_t)
+
+    def call(self, ctx: Sequence[Binding | Expr]) -> str:
+        return f"{self.name}({', '.join(a.expr for a in translate(ctx, self.arguments()))})"
+
+
+# steps:
+#   1. take the functional signature
+#   2. use api.ufunc to convert it to template signature.  this establishes
+#      the type of the template function
+#   3. use api.ufunc (II) to generate a split struct / operator() signature.
+#      this establish context in which we call the template signature
+#
+# StructuredImplSignature context
+#   ~> functor constructor sig
+#
+# Functor constructor context
+#   ~> functor fields sig
+#
+# Functor apply context (functor fields + functor apply sig)
+#   ~> template sig
+#
+
+
+def eligible_for_binary_scalar_specialization(g: NativeFunctionsGroup) -> bool:
+    num_tensors = sum(
+        1 for a in g.functional.func.arguments.flat_non_out if a.type.is_tensor_like()
+    )
+    return num_tensors == 2
+
+
+def compute_ufunc_cuda_functors(
+    g: NativeFunctionsGroup,
+) -> tuple[dict[ScalarType, dict[UfuncKey, UfunctorSignature]], str]:
+    # First, build the functors.
+    ufunctor_sigs: dict[ScalarType, dict[UfuncKey, UfunctorSignature]] = {}
+    ufunctors: list[str] = []
+    loops = g.out.ufunc_inner_loop
+    scalar_tensor_idx_lookup = {
+        UfuncKey.CUDAFunctorOnSelf: 1,
+        UfuncKey.CUDAFunctorOnOther: 0,
+        UfuncKey.CUDAFunctor: None,
+    }
+    if eligible_for_binary_scalar_specialization(g):
+        keys = [
+            UfuncKey.CUDAFunctorOnSelf,
+            UfuncKey.CUDAFunctorOnOther,
+            UfuncKey.CUDAFunctor,
+        ]
+    else:
+        keys = [UfuncKey.CUDAFunctor]
+        for k in [UfuncKey.CUDAFunctorOnSelf, UfuncKey.CUDAFunctorOnOther]:
+            assert k not in loops, f"cannot use {k} on non-binary function"
+    for k in keys:
+        # If the key was directly defined, skip functor codegen; we assume the
+        # user already done it for us
+        if k in loops:
+            ufunctor_sig = UfunctorSignature(
+                g, scalar_tensor_idx=scalar_tensor_idx_lookup[k], name=loops[k].name
+            )
+            for dtype in loops[k].supported_dtypes:
+                ufunctor_sigs.setdefault(dtype, {})[k] = ufunctor_sig
+            continue
+
+        # Note [ScalarOnly and Generic must match names for CUDA]
+        # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+        # Otherwise, look in ANY of the generic entries.  For simplicity of
+        # codegen, both ScalarOnly and Generic are defined, the ufunc name
+        # must match  (if they didn't match, we'd have to generate distinct
+        # functors per dtype, which is awful, so we're not going to do it unless
+        # someone really forces us to)
+        ufunc_name = None
+        supported_dtypes: OrderedSet[ScalarType] = OrderedSet()
+        for lk in [UfuncKey.ScalarOnly, UfuncKey.Generic]:
+            if lk not in loops:
+                continue
+            if ufunc_name is None:
+                ufunc_name = loops[lk].name
+            else:
+                # See Note [ScalarOnly and Generic must match names for CUDA]
+                assert ufunc_name == loops[lk].name, (
+                    "ScalarOnly and Generic must have same ufunc name"
+                )
+            supported_dtypes |= loops[lk].supported_dtypes
+        assert ufunc_name is not None
+
+        name = f"{k}_{ufunc_name}"
+        ufunctor_sig = UfunctorSignature(
+            g, scalar_tensor_idx=scalar_tensor_idx_lookup[k], name=name
+        )
+        for dtype in supported_dtypes:
+            ufunctor_sigs.setdefault(dtype, {})[k] = ufunctor_sig
+
+        ufunc_sig = UfuncSignature(
+            g, name=f"ufunc::{ufunc_name}", compute_t=BaseCType(opmath_t)
+        )
+        apply_ctx = ufunctor_sig.fields() + ufunctor_sig.arguments().apply
+        ufunctors.append(
+            f"""
+template <typename scalar_t>
+struct {ufunctor_sig.name} {{
+  using opmath_t = at::opmath_type<scalar_t>;
+  {ufunctor_sig.decl_fields()}
+  {ufunctor_sig.inline_defn_ctor()}
+  __device__ {ufunctor_sig.decl_apply()} {{
+    return {ufunc_sig.call(apply_ctx)};
+  }}
+}};
+"""
+        )
+
+    return ufunctor_sigs, "\n".join(ufunctors)
+
+
+@dataclass(frozen=True)
+class BinaryScalarSpecializationConfig:
+    scalar_idx: int
+    ctor_tensor: str
+    ufunc_key: UfuncKey
+
+
+BinaryScalarSpecializationConfigs = [
+    BinaryScalarSpecializationConfig(
+        scalar_idx=0,
+        ctor_tensor="self",
+        ufunc_key=UfuncKey.CUDAFunctorOnOther,
+    ),
+    BinaryScalarSpecializationConfig(
+        scalar_idx=1,
+        ctor_tensor="other",
+        ufunc_key=UfuncKey.CUDAFunctorOnSelf,
+    ),
+]
+
+
+def compute_ufunc_cuda_dtype_body(
+    g: NativeFunctionsGroup,
+    dtype: ScalarType,
+    inner_loops: dict[UfuncKey, UfunctorSignature],
+    parent_ctx: Sequence[Binding],
+) -> str:
+    body = "using opmath_t = at::opmath_type<scalar_t>;"
+    body += "if (false) {}\n"  # for ease of codegen
+    for config in BinaryScalarSpecializationConfigs:
+        if config.ufunc_key not in inner_loops:
+            continue
+        ufunctor_sig = inner_loops[config.ufunc_key]
+        scalar_idx = config.scalar_idx + 1
+        # Make a copy and at the same time widen the type (not permissible
+        # without copy; we don't want to mutate the input argument anyway)
+        ctx: list[Expr | Binding] = list(parent_ctx)
+        ctx.append(
+            Expr(
+                expr=f"iter.scalar_value<opmath_t>({scalar_idx})",
+                type=NamedCType(config.ctor_tensor, BaseCType(opmath_t)),
+            )
+        )
+        ufunctor_ctor_exprs_str = ", ".join(
+            a.expr for a in translate(ctx, ufunctor_sig.arguments().ctor)
+        )
+
+        # NB: ufunctor must be allocated before iter.remove_operand is called,
+        # as it relies on iter
+        body += f"""\
+else if (iter.is_cpu_scalar({scalar_idx})) {{
+  {ufunctor_sig.name}<scalar_t> ufunctor({ufunctor_ctor_exprs_str});
+  iter.remove_operand({scalar_idx});
+  gpu_kernel(iter, ufunctor);
+}}"""
+
+    ufunctor_sig = inner_loops[UfuncKey.CUDAFunctor]
+    ufunctor_ctor_exprs_str = ", ".join(
+        a.expr for a in translate(parent_ctx, ufunctor_sig.arguments().ctor)
+    )
+    body += f"""
+else {{
+  gpu_kernel(iter, {ufunctor_sig.name}<scalar_t>({ufunctor_ctor_exprs_str}));
+}}
+    """
+    return body
+
+
+@with_native_function
+def compute_ufunc_cuda(g: NativeFunctionsGroup) -> str:
+    # First, build the functors, indexing them by dtype
+    ufunctor_sigs, ufunctors = compute_ufunc_cuda_functors(g)
+
+    # Next, build the conditionals
+    sig = StructuredImplSignature(g, ufunc.kernel_name(g, DispatchKey.CUDA))
+    dtype_cases = []
+    for dtype, inner_ufunc_sigs in ufunctor_sigs.items():
+        dtype_cases.append(
+            f"""
+AT_DISPATCH_CASE(at::ScalarType::{dtype},
+  [&]() {{
+    {compute_ufunc_cuda_dtype_body(g, dtype, inner_ufunc_sigs, sig.arguments())}
+  }}
+)
+"""
+        )
+
+    dtype_cases_str = "\n".join(dtype_cases)
+
+    stub_sig = StubSignature(g)
+
+    return f"""
+{ufunctors}
+
+{stub_sig.type_defn()};
+{stub_sig.dispatch_decl()}
+
+{stub_sig.kernel_defn()} {{
+  AT_DISPATCH_SWITCH(iter.common_dtype(), "{sig.name}",
+    {dtype_cases_str}
+  );
+}}
+REGISTER_DISPATCH({stub_sig.name}, &{stub_sig.kernel_name})
+
+{sig.defn()} {{
+  {stub_sig.direct_call(sig.arguments())};
+}}
+"""
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                                   CPU STUFF
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+@dataclass(frozen=True)
+class StubSignature:
+    g: NativeFunctionsGroup
+
+    @property
+    def name(self) -> str:
+        return f"{str(self.g.functional.func.name.name)}_stub"
+
+    @property
+    def kernel_name(self) -> str:
+        return f"{str(self.g.functional.func.name.name)}_kernel"
+
+    @property
+    def type_name(self) -> str:
+        return f"{str(self.g.functional.func.name.name)}_fn"
+
+    def arguments(self) -> list[Binding]:
+        return ufunc.stub_arguments(self.g)
+
+    def type(self) -> str:
+        cpp_args = self.arguments()
+        return f"void(*)(TensorIteratorBase&, {', '.join(a.type for a in cpp_args)})"
+
+    def dispatch_decl(self) -> str:
+        return f"DECLARE_DISPATCH({self.type_name}, {self.name})"
+
+    def dispatch_defn(self) -> str:
+        return f"DEFINE_DISPATCH({self.name})"
+
+    def kernel_defn(self) -> str:
+        return f"void {self.kernel_name}(TensorIteratorBase& iter, {', '.join(a.defn() for a in self.arguments())})"
+
+    def type_defn(self) -> str:
+        return f"using {self.type_name} = {self.type()}"
+
+    # must be called from context where this is TensorIteratorBase*
+    def call(self, ctx: Sequence[Binding]) -> str:
+        return f"{self.name}(device_type(), *this, {', '.join(a.expr for a in translate(ctx, self.arguments()))})"
+
+    # used in CUDA to skip the unnecessary dynamic dispatch
+    def direct_call(self, ctx: Sequence[Binding]) -> str:
+        return f"{self.kernel_name}(*this, {', '.join(a.expr for a in translate(ctx, self.arguments()))})"
+
+
+@with_native_function
+def compute_ufunc_cpu(g: NativeFunctionsGroup) -> str:
+    stub_sig = StubSignature(g)
+    sig = StructuredImplSignature(g, ufunc.kernel_name(g, DispatchKey.CPU))
+
+    return f"""
+{stub_sig.type_defn()};
+{stub_sig.dispatch_decl()}
+{stub_sig.dispatch_defn()};
+
+{sig.defn()} {{
+  {stub_sig.call(sig.arguments())};
+}}
+"""
+
+
+def compute_ufunc_cpu_dtype_body(
+    g: NativeFunctionsGroup,
+    dtype: ScalarType,
+    inner_loops: dict[UfuncKey, UfuncSignature],
+    parent_ctx: Sequence[Binding],
+) -> str:
+    assert UfuncKey.CPUScalar in inner_loops, f"{dtype}, {inner_loops.keys()}"
+    assert inner_loops.keys() <= {UfuncKey.CPUScalar, UfuncKey.CPUVector}
+    scalar_loop = inner_loops[UfuncKey.CPUScalar]
+    vec_loop = None
+    if UfuncKey.CPUVector in inner_loops:
+        vec_loop = inner_loops[UfuncKey.CPUVector]
+
+    # NB: We DON'T use translate here, because translate is
+    # incapable of CSE'ing the scalar accesses in case it is also
+    # used by Vectorized; also, the unpacking here is very simple
+    # and only affects Scalar; everything else is implicitly captured
+    # by the lambda
+
+    # Setup scalar in scope
+    body = []
+    ctx = []
+    for b in parent_ctx:
+        if isinstance(b.argument, Argument) and b.argument.type != BaseType(
+            BaseTy.Scalar
+        ):
+            continue
+        body.append(f"auto _s_{b.name} = {b.name}.to<scalar_t>();")
+        ctx.append(Expr(f"_s_{b.name}", NamedCType(b.nctype.name, BaseCType(scalar_t))))
+    if vec_loop is not None:
+        for b in parent_ctx:
+            if isinstance(b.argument, Argument) and b.argument.type != BaseType(
+                BaseTy.Scalar
+            ):
+                continue
+            body.append(
+                f"auto _v_{b.name} = at::vec::Vectorized<scalar_t>(_s_{b.name});"
+            )
+            ctx.append(
+                Expr(
+                    f"_v_{b.name}",
+                    NamedCType(b.nctype.name, VectorizedCType(BaseCType(scalar_t))),
+                )
+            )
+
+    # Setup lambda signature
+    # NB: simplified version of ufunctor_arguments
+    scalar_bindings = []
+    vec_bindings = []
+    for a in g.functional.func.arguments.flat_non_out:
+        if not a.type.is_tensor_like():
+            continue
+        assert a.type == BaseType(BaseTy.Tensor)
+        scalar_bindings.append(
+            Binding(
+                name=a.name,
+                nctype=NamedCType(a.name, BaseCType(scalar_t)),
+                argument=a,
+            )
+        )
+        if vec_loop is not None:
+            vec_bindings.append(
+                Binding(
+                    name=a.name,
+                    nctype=NamedCType(a.name, VectorizedCType(BaseCType(scalar_t))),
+                    argument=a,
+                )
+            )
+
+    def with_ctx(b: Sequence[Binding]) -> list[Expr | Binding]:
+        r: list[Expr | Binding] = []
+        r.extend(ctx)
+        r.extend(b)
+        return r
+
+    body_str = "\n".join(body)
+    if vec_loop is not None:
+        return f"""
+{body_str}
+cpu_kernel_vec(iter,
+  [=]({", ".join(b.decl() for b in scalar_bindings)}) {{ return {scalar_loop.call(with_ctx(scalar_bindings))}; }},
+  [=]({", ".join(b.decl() for b in vec_bindings)}) {{ return {vec_loop.call(with_ctx(vec_bindings))}; }}
+);
+"""
+    else:
+        return f"""
+{body_str}
+cpu_kernel(iter,
+  [=]({", ".join(b.decl() for b in scalar_bindings)}) {{ return {scalar_loop.call(with_ctx(scalar_bindings))}; }}
+);
+"""
+
+
+@with_native_function
+def compute_ufunc_cpu_kernel(g: NativeFunctionsGroup) -> str:
+    stub_sig = StubSignature(g)
+
+    # Reindex the ufunc by dtypes; processing generic/scalaronly as well
+    loops = g.out.ufunc_inner_loop
+    ufunc_sigs: dict[ScalarType, dict[UfuncKey, UfuncSignature]] = {}
+    for k in [UfuncKey.CPUScalar, UfuncKey.CPUVector]:
+        lks = []
+        # ORDER MATTERS: this specifies overriding precedence
+        if k in loops:  # should happen rarely
+            lks.append(k)
+        if UfuncKey.ScalarOnly in loops and k is UfuncKey.CPUScalar:
+            lks.append(UfuncKey.ScalarOnly)
+        if UfuncKey.Generic in loops:
+            lks.append(UfuncKey.Generic)
+        # TODO: don't hardcode ufunc:: namespace here, should be centralized smh
+        for lk in lks:
+            for dtype in loops[lk].supported_dtypes:
+                compute_t: CType
+                if k is UfuncKey.CPUScalar:
+                    compute_t = BaseCType(scalar_t)
+                elif k is UfuncKey.CPUVector:
+                    compute_t = VectorizedCType(BaseCType(scalar_t))
+                else:
+                    raise AssertionError
+                inner_ufunc_sigs = ufunc_sigs.setdefault(dtype, {})
+                if k not in inner_ufunc_sigs:
+                    inner_ufunc_sigs[k] = UfuncSignature(
+                        g, name=f"ufunc::{loops[lk].name}", compute_t=compute_t
+                    )
+
+    # Build the conditionals
+    dtype_cases = []
+    for dtype, inner_ufunc_sigs in ufunc_sigs.items():
+        dtype_cases.append(
+            f"""
+AT_DISPATCH_CASE(at::ScalarType::{dtype},
+  [&]() {{
+    {compute_ufunc_cpu_dtype_body(g, dtype, inner_ufunc_sigs, stub_sig.arguments())}
+  }}
+)
+"""
+        )
+
+    dtype_cases_str = "\n".join(dtype_cases)
+    return f"""
+namespace {{
+
+{stub_sig.kernel_defn()} {{
+  AT_DISPATCH_SWITCH(iter.common_dtype(), "{stub_sig.name}",
+    {dtype_cases_str}
+  );
+}}
+
+}} // anonymous namespace
+
+{stub_sig.type_defn()};
+{stub_sig.dispatch_decl()}
+REGISTER_DISPATCH({stub_sig.name}, &{stub_sig.kernel_name})
+"""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..51c4eea41d787c3e6a028adac6da6215ffa4b31f
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__pycache__/gen_mobile_upgraders.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__pycache__/gen_mobile_upgraders.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..967ee990ec7bc5b34ef3bd6e2592c2f08aea2a36
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__pycache__/gen_mobile_upgraders.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__pycache__/gen_mobile_upgraders_constant.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__pycache__/gen_mobile_upgraders_constant.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..d52fa6a6a1fd092194f4316571d2d501a4fae466
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/__pycache__/gen_mobile_upgraders_constant.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/gen_mobile_upgraders.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/gen_mobile_upgraders.py
new file mode 100644
index 0000000000000000000000000000000000000000..15b74ac9c21a70d3f97df0dae210087072c15142
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/gen_mobile_upgraders.py
@@ -0,0 +1,386 @@
+#!/usr/bin/env python3
+
+from __future__ import annotations
+
+import os
+from enum import Enum
+from operator import itemgetter
+from pathlib import Path
+from typing import Any
+
+import torch
+from torch.jit.generate_bytecode import generate_upgraders_bytecode
+from torchgen.code_template import CodeTemplate
+from torchgen.operator_versions.gen_mobile_upgraders_constant import (
+    MOBILE_UPGRADERS_HEADER_DESCRIPTION,
+)
+
+
+class ByteCode(Enum):
+    instructions = 1
+    constants = 2
+    types = 3
+    operators = 4
+    register_size = 5
+
+
+EXCLUDED_OP_SET = [
+    "aten::full.names",
+    "aten::full.out",
+    "aten::full",
+]
+
+EXCLUE_UPGRADER_SET = ["full_0_4", "full_out_0_4"]
+
+ONE_INSTRUCTION = CodeTemplate(
+    """
+    Instruction{OpCode::${operator_name}, ${X}, ${N}},"""
+)
+
+INSTRUCTION_LIST = CodeTemplate(
+    """std::vector<Instruction>({
+        ${instruction_list}
+    }), // instructions list"""
+)
+
+ONE_CONSTANT = CodeTemplate(
+    """
+    c10::IValue(${constant}),"""
+)
+
+CONSTANT_LIST = CodeTemplate(
+    """std::vector<c10::IValue>({
+        ${constant_list}
+    }), // constants list"""
+)
+
+CONSTANTS_LIST_EMPTY = """std::vector<c10::IValue>(), // constants list"""
+
+ONE_TYPE = CodeTemplate("""c10::parseType("${type_str}"),""")
+
+TYPE_LIST = CodeTemplate(
+    """std::vector<c10::TypePtr>({
+        ${type_list}
+    }), // types list"""
+)
+
+TYPE_LIST_EMPTY = """std::vector<c10::TypePtr>(), // types list"""
+
+ONE_OPERATOTR_STRING = CodeTemplate(
+    """
+    OperatorString({"${operator_name}", "${overload_name}", ${num_of_args}}),"""
+)
+
+OPERATOR_STRING_LIST = CodeTemplate(
+    """
+    std::vector<OperatorString>({
+        ${operator_string_list}
+    }), // operators list"""
+)
+
+ONE_UPGRADER_FUNCTION = CodeTemplate(
+    """
+    mobile::Function::registerFunc(
+        "${upgrader_name}",
+        ${instruction_list},
+        ${constant_list},
+        ${type_list},
+        ${register_size}
+    )"""
+)
+
+ONE_UPGRADER_SRC = CodeTemplate(
+    """
+    ByteCodeFunctionWithOperator({
+        ${bytecode_function},
+        ${operator_string_list}
+    }),"""
+)
+
+
+ONE_UPGRADER_IN_VERSION_MAP = CodeTemplate(
+    """Upgrader({${upgrader_min_version}, ${upgrader_max_version}, "${upgrader_name}", ${bytecode_func_index}})"""
+)  # noqa: E501
+
+ONE_OPERATOR_IN_VERSION_MAP = CodeTemplate(
+    """
+    {std::string("${operator_name}"),
+        std::vector<Upgrader>({
+            ${upgrader_list_in_version_map}
+        })},"""
+)
+
+
+OPERATOR_VERSION_MAP = CodeTemplate(
+    """
+const std::unordered_map<std::string, std::vector<Upgrader>>
+getOperatorVersionMapForMobile() {
+  static std::unordered_map<std::string, std::vector<Upgrader>>
+        operatorVersionMapForMobile({
+            ${operator_list_in_version_map}
+      });
+  return operatorVersionMapForMobile;
+}
+"""
+)
+
+
+UPGRADER_CPP_SRC = CodeTemplate(
+    MOBILE_UPGRADERS_HEADER_DESCRIPTION
+    + """
+#include <caffe2/serialize/versions.h>
+#include <torch/csrc/jit/mobile/type_parser.h>
+#include <torch/csrc/jit/mobile/upgrader_mobile.h>
+
+namespace torch {
+namespace jit {
+
+// clang-format off
+
+// From operator_versions_map
+${operator_version_map}
+
+const std::vector<ByteCodeFunctionWithOperator>& getUpgraderBytecodeList() {
+  auto generate_upgrader_bytecode_list = []() {
+    std::vector<ByteCodeFunctionWithOperator> upgrader_function_list({
+               ${upgrader_bytecode}
+            });
+    for (const auto& upgrader_function : upgrader_function_list) {
+      for (const auto& op : upgrader_function.operators) {
+        upgrader_function.function.append_operator(
+            op.name,
+            op.overload_name,
+            op.num_specified_args);
+      }
+    }
+    return upgrader_function_list;
+  };
+  static std::vector<ByteCodeFunctionWithOperator> upgraderBytecodeList =
+      generate_upgrader_bytecode_list();
+  return upgraderBytecodeList;
+}
+
+// clang-format on
+
+} // namespace jit
+} // namespace torch
+"""
+)
+
+UPGRADER_MOBILE_FILE_NAME = "upgrader_mobile.cpp"
+
+UPGRADER_ELEMENT = CodeTemplate(
+    """\
+Upgrader({${min_version}, ${max_version}, ${operator_name}, ${index}}),
+"""
+)
+
+PER_OPERATOR_UPGRADER_LIST = CodeTemplate(
+    """\
+{
+  std::string(${operator_name}),
+  std::vector<Upgrader>({${upgrader_list}});
+}
+"""
+)
+
+
+def construct_instruction(instruction_list_from_yaml: list[Any]) -> str:
+    instruction_list_part = [
+        ONE_INSTRUCTION.substitute(
+            operator_name=instruction[0],
+            X=instruction[1],
+            N=instruction[2],
+        )
+        for instruction in instruction_list_from_yaml
+    ]
+    return INSTRUCTION_LIST.substitute(
+        instruction_list="".join(instruction_list_part).lstrip("\n")
+    )
+
+
+def construct_constants(constants_list_from_yaml: list[Any]) -> str:
+    constants_list_part = []
+    for constant_from_yaml in constants_list_from_yaml:
+        convert_constant = None
+        if isinstance(constant_from_yaml, str):
+            # Add quotes if it's string
+            convert_constant = f'"{constant_from_yaml}"'
+        elif isinstance(constant_from_yaml, bool):
+            convert_constant = "true" if constant_from_yaml else "false"
+        elif constant_from_yaml is None:
+            convert_constant = ""
+        elif isinstance(constant_from_yaml, int):
+            convert_constant = str(constant_from_yaml)
+        else:
+            raise ValueError(
+                f"The type of {constant_from_yaml} is {type(constant_from_yaml)}. "
+                "Please add change in construct_constants function in gen_mobile_upgraders.py."
+            )
+        constants_list_part.append(ONE_CONSTANT.substitute(constant=convert_constant))
+    if len(constants_list_part) == 0:
+        return CONSTANTS_LIST_EMPTY
+    return CONSTANT_LIST.substitute(
+        constant_list="".join(constants_list_part).lstrip("\n")
+    )
+
+
+def construct_operators(operator_list_from_yaml: list[Any]) -> str:
+    operator_list_part = [
+        ONE_OPERATOTR_STRING.substitute(
+            operator_name=operator[0],
+            overload_name=operator[1],
+            num_of_args=operator[2],
+        )
+        for operator in operator_list_from_yaml
+    ]
+    return OPERATOR_STRING_LIST.substitute(
+        operator_string_list="".join(operator_list_part).lstrip("\n")
+    )
+
+
+def construct_types(types_tr_list_from_yaml: list[Any]) -> str:
+    types_tr_list_part = [
+        ONE_TYPE.substitute(type_str=types_tr) for types_tr in types_tr_list_from_yaml
+    ]
+    if len(types_tr_list_part) == 0:
+        return TYPE_LIST_EMPTY
+    return TYPE_LIST.substitute(type_list="".join(types_tr_list_part).lstrip("\n"))
+
+
+def construct_register_size(register_size_from_yaml: int) -> str:
+    if not isinstance(register_size_from_yaml, int):
+        raise ValueError(
+            f"Input register size is {register_size_from_yaml} and"
+            "it's type is {type(register_size_from_yaml)}. An int type is expected."
+        )
+    return str(register_size_from_yaml)
+
+
+def construct_version_maps(
+    upgrader_bytecode_function_to_index_map: dict[str, Any],
+) -> str:
+    version_map = torch._C._get_operator_version_map()
+    sorted_version_map_ = sorted(version_map.items(), key=itemgetter(0))  # type: ignore[no-any-return]
+    sorted_version_map = dict(sorted_version_map_)
+
+    operator_list_in_version_map_part = []
+    for op_name in sorted_version_map:
+        upgraders_in_version_map_part = []
+        # TODO: remove the skip after these two operators schemas are fixed
+        if op_name in EXCLUDED_OP_SET:
+            continue
+        upgrader_ranges = torch._C._get_upgrader_ranges(op_name)
+        upgrader_entries = sorted_version_map[op_name]
+        assert len(upgrader_ranges) == len(upgrader_entries)
+        for idx, upgrader_entry in enumerate(upgrader_entries):
+            upgrader_name = upgrader_entry.upgrader_name
+            bytecode_function_index = upgrader_bytecode_function_to_index_map[
+                upgrader_name
+            ]
+            upgraders_in_version_map_part.append(
+                ONE_UPGRADER_IN_VERSION_MAP.substitute(
+                    upgrader_min_version=upgrader_ranges[idx].min_version,
+                    upgrader_max_version=upgrader_ranges[idx].max_version,
+                    upgrader_name=upgrader_name,
+                    bytecode_func_index=bytecode_function_index,
+                )
+            )
+        operator_list_in_version_map_part.append(
+            ONE_OPERATOR_IN_VERSION_MAP.substitute(
+                operator_name=op_name,
+                upgrader_list_in_version_map="".join(upgraders_in_version_map_part),
+            )
+        )
+    return OPERATOR_VERSION_MAP.substitute(
+        operator_list_in_version_map="".join(operator_list_in_version_map_part).lstrip(
+            "\n"
+        )
+    )
+
+
+def get_upgrader_bytecode_function_to_index_map(
+    upgrader_dict: list[dict[str, Any]],
+) -> dict[str, Any]:
+    upgrader_bytecode_function_to_index_map = {}
+    index = 0
+    for upgrader_bytecode in upgrader_dict:
+        for upgrader_name in upgrader_bytecode:
+            if upgrader_name in EXCLUE_UPGRADER_SET:
+                continue
+            upgrader_bytecode_function_to_index_map[upgrader_name] = index
+            index += 1
+    return upgrader_bytecode_function_to_index_map
+
+
+def write_cpp(cpp_path: str, upgrader_dict: list[dict[str, Any]]) -> None:
+    upgrader_bytecode_function_to_index_map = (
+        get_upgrader_bytecode_function_to_index_map(upgrader_dict)
+    )
+    version_map_src = construct_version_maps(upgrader_bytecode_function_to_index_map)
+    all_upgrader_src_string = []
+    for upgrader_bytecode in upgrader_dict:
+        for upgrader_name, bytecode in upgrader_bytecode.items():
+            # TODO: remove the skip after these two operators schemas are fixed
+            if upgrader_name in EXCLUE_UPGRADER_SET:
+                continue
+            instruction_list_str = ""
+            constant_list_str = ""
+            type_list_str = ""
+            register_size_str = ""
+            operator_list_str = ""
+            for table_name, contents in bytecode.items():
+                element = ByteCode[table_name]
+                if element is ByteCode.instructions:
+                    instruction_list_str = construct_instruction(contents)
+                elif element is ByteCode.constants:
+                    constant_list_str = construct_constants(contents)
+                elif element is ByteCode.operators:
+                    operator_list_str = construct_operators(contents)
+                elif element is ByteCode.types:
+                    type_list_str = construct_types(contents)
+                elif element is ByteCode.register_size:
+                    register_size_str = construct_register_size(contents)
+
+            one_upgrader_function_string = ONE_UPGRADER_FUNCTION.substitute(
+                upgrader_name=upgrader_name,
+                instruction_list=instruction_list_str,
+                constant_list=constant_list_str,
+                type_list=type_list_str,
+                register_size=register_size_str,
+            )
+            one_upgrader_src_string = ONE_UPGRADER_SRC.substitute(
+                bytecode_function=one_upgrader_function_string.lstrip("\n"),
+                operator_string_list=operator_list_str.lstrip("\n"),
+            )
+            all_upgrader_src_string.append(one_upgrader_src_string)
+
+    upgrader_file_content = UPGRADER_CPP_SRC.substitute(
+        operator_version_map=version_map_src,
+        upgrader_bytecode="".join(all_upgrader_src_string).lstrip("\n"),
+    )
+    print("writing file to : ", cpp_path + "/" + UPGRADER_MOBILE_FILE_NAME)
+    with open(os.path.join(cpp_path, UPGRADER_MOBILE_FILE_NAME), "wb") as out_file:
+        out_file.write(upgrader_file_content.encode("utf-8"))
+
+
+def sort_upgrader(upgrader_list: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    sorted_upgrader_list = sorted(
+        upgrader_list, key=lambda one_upgrader: next(iter(one_upgrader))
+    )
+    return sorted_upgrader_list
+
+
+def main() -> None:
+    upgrader_list = generate_upgraders_bytecode()
+    sorted_upgrader_list = sort_upgrader(upgrader_list)
+    for up in sorted_upgrader_list:
+        print("after sort upgrader : ", next(iter(up)))
+
+    pytorch_dir = Path(__file__).resolve().parents[2]
+    upgrader_path = pytorch_dir / "torch" / "csrc" / "jit" / "mobile"
+    write_cpp(str(upgrader_path), sorted_upgrader_list)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/gen_mobile_upgraders_constant.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/gen_mobile_upgraders_constant.py
new file mode 100644
index 0000000000000000000000000000000000000000..04b5ad887e54153115eeca7b6686d7c2de8dfc06
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/operator_versions/gen_mobile_upgraders_constant.py
@@ -0,0 +1,7 @@
+MOBILE_UPGRADERS_HEADER_DESCRIPTION = """/**
+ * @generated
+ * This is an auto-generated file. Please do not modify it by hand.
+ * To re-generate, please run:
+ * cd ~/pytorch && python torchgen/operator_versions/gen_mobile_upgraders.py
+ */
+"""
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/native/native_functions.yaml b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/native/native_functions.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..db737962fd7bc22cd6e002d4f82128def325f0ca
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/native/native_functions.yaml
@@ -0,0 +1,16098 @@
+# See README.md in this directory for more guidance
+
+# *********NB: _cast_* operators are DEPRECATED and will be removed
+# eventually. These were previously used before TorchScript IR supported
+# representing ScalarType's. They are now superseded by usage of
+# `aten::to()`. The ops remain here for backward compatibility purposes.
+
+# DEPRECATED. DO NOT USE
+- func: _cast_Byte(Tensor self, bool non_blocking=False) -> Tensor
+  variants: function
+
+# DEPRECATED. DO NOT USE
+- func: _cast_Char(Tensor self, bool non_blocking=False) -> Tensor
+  variants: function
+
+# DEPRECATED. DO NOT USE
+- func: _cast_Double(Tensor self, bool non_blocking=False) -> Tensor
+  variants: function
+
+# DEPRECATED. DO NOT USE
+- func: _cast_Float(Tensor self, bool non_blocking=False) -> Tensor
+  variants: function
+
+# DEPRECATED. DO NOT USE
+- func: _cast_Int(Tensor self, bool non_blocking=False) -> Tensor
+  variants: function
+
+# DEPRECATED. DO NOT USE
+- func: _cast_Long(Tensor self, bool non_blocking=False) -> Tensor
+  variants: function
+
+# DEPRECATED. DO NOT USE
+- func: _cast_Short(Tensor self, bool non_blocking=False) -> Tensor
+  variants: function
+
+# DEPRECATED. DO NOT USE
+- func: _cast_Half(Tensor self, bool non_blocking=False) -> Tensor
+  variants: function
+
+# Computes the gradient of current tensor w.r.t. graph leaves.
+- func: _backward(Tensor self, Tensor[] inputs, Tensor? gradient=None, bool? retain_graph=None, bool create_graph=False) -> ()
+  manual_cpp_binding: True
+  variants: method
+
+# DEPRECATED. Sets the tensor data held by this `Variable` to be the same as
+# `new_data`.  It requires that `new_data` and `Variable` have compatible tensor
+# type, by checking `_has_compatible_shallow_copy_type(this, new_data)`.
+#
+# This function is deprecated because it doesn't really make sense in a world
+# where Variables *are* Tensors (as opposed to them containing tensors, which
+# is what the previous interpretation was.)
+- func: set_data(Tensor(a!) self, Tensor new_data) -> ()
+  manual_cpp_binding: True
+  variants: method
+
+- func: data(Tensor self) -> Tensor
+  manual_cpp_binding: True
+  variants: method
+
+# True if this `Variable` is a leaf and thus does not have a `grad_fn`.
+- func: is_leaf(Tensor self) -> bool
+  manual_cpp_binding: True
+  variants: method
+
+# Returns the output index of this variable from the forward operation that
+# produced it.  Conversely, it returns the input index of the gradient `Node` to
+# which this `Variable` is connected (because in the gradient computation,
+# inputs and outputs switch meaning).  For example:
+#
+#   y0, y1, y2 = f(x)
+#   assert y0.output_nr == 0
+#   assert y1.output_nr == 1
+#   assert y2.output_nr == 2
+#
+- func: output_nr(Tensor self) -> int
+  manual_cpp_binding: True
+  variants: method
+
+- func: _version(Tensor self) -> int
+  manual_cpp_binding: True
+  variants: method
+
+- func: requires_grad_(Tensor(a!) self, bool requires_grad=True) -> Tensor(a!)
+  manual_cpp_binding: True
+  variants: method
+
+# Enables .grad attribute for non-leaf Tensors.
+- func: retain_grad(Tensor(a!) self) -> ()
+  manual_cpp_binding: True
+  variants: method
+
+- func: retains_grad(Tensor self) -> bool
+  manual_cpp_binding: True
+  variants: method
+
+- func: _fw_primal(Tensor(a) self, int level) -> Tensor(a)
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: _fw_primal
+
+- func: _make_dual(Tensor(a) primal, Tensor tangent, int level) -> Tensor(a)
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _make_dual
+
+- func: _unpack_dual(Tensor(a) dual, int level) -> (Tensor(a) primal, Tensor tangent)
+  variants: function
+
+# NOTE: [_new_zeros_with_same_feature_meta]
+# This function creates a new tensor with the layout and TensorOptions
+# of `other` but also takes into account the batch dimensions of `self`
+#
+# This function has a couple extra constraints because it is also used for `jvp`
+# in functorch.
+# - is used for forward AD because there is the restriction
+#   that the primal and tangent must have the same layout
+# - We cannot assume that `self` and `other` have the same sizes or even dim
+#   because in the inplace over view case, `other` is the base tensor, and
+#   `self` is the forward grad with respect to the view, which can have an
+#   entirely different shape
+# - takes the number of batch dims for `self` because we also handle
+#   some batching logic. We handle that here instead of a batching rule because
+#   we'd like to avoid calling as_strided in the batching rule (as to enable
+#   nested vmap in functorch).
+# - needs to be CompositeExplicitAutograd for jvp support in functorch.
+#   functorch currently relies on TensorWrapper which does not have storage
+#   CompositeExplicitAutograd makes sure the TensorWrapper is unwrapped.
+# - this function may eventually take on another int argument to store the
+#   the number of batch dims for other once we support that use case
+- func: _new_zeros_with_same_feature_meta(Tensor self, Tensor other, *, int self_num_batch_dims=0) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _new_zeros_with_same_feature_meta
+  autogen: _new_zeros_with_same_feature_meta.out
+
+# This function compares the storage numel of self with that of other, where
+# storage numel is computed as: `other.storage().nbytes() / other.itemsize()`.
+# We create this function for composite compliance purposes. The batching rule
+# always returns true because vmapped as_strided does not support accessing
+# storage locations not indexable by the input tensor.
+# See the note above for more information.
+- func: _has_same_storage_numel(Tensor self, Tensor other) -> bool
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _has_same_storage_numel
+
+- func: rename_(Tensor(a!) self, Dimname[]? names) -> Tensor(a!)
+  variants: method
+  tags: inplace_view
+
+- func: rename(Tensor(a) self, Dimname[]? names) -> Tensor(a)
+  variants: method
+
+- func: align_to(Tensor(a) self, Dimname[] names) -> Tensor(a)
+  variants: method
+
+- func: align_to.ellipsis_idx(Tensor(a) self, Dimname[] order, int ellipsis_idx) -> Tensor(a)
+  variants: method
+
+- func: align_as(Tensor self, Tensor other) -> Tensor
+  variants: method
+
+- func: align_tensors(Tensor[] tensors) -> Tensor[]
+
+# Not assert because it's a keyword; not Assert because FX already
+# took that syntax
+# TODO: need to specify this is side-effectful somehow
+- func: _assert_async(Tensor self) -> ()
+  dispatch:
+    CPU: _assert_async_cpu
+    CUDA: _assert_async_cuda
+
+- func: _assert_async.msg(Tensor self, str assert_msg) -> ()
+  dispatch:
+    CPU: _assert_async_msg_cpu
+    CUDA: _assert_async_msg_cuda
+
+- func: _assert_scalar(Scalar self, str assert_msg) -> ()
+  dispatch:
+    CompositeExplicitAutograd: _assert_scalar
+
+- func: _functional_assert_scalar(Scalar self, str assert_msg, Tensor dep_token) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _functional_assert_scalar
+
+- func: _functional_assert_async.msg(Tensor self, str assert_msg, Tensor dep_token) -> Tensor
+  dispatch:
+    CPU: _functional_assert_async_msg_cpu
+
+- func: _assert_tensor_metadata(Tensor a, SymInt[]? size=None, SymInt[]? stride=None, ScalarType? dtype=None, *, Device? device=None, Layout? layout=None) -> ()
+  dispatch:
+    CompositeExplicitAutograd: _assert_tensor_metadata
+    Meta: _assert_tensor_metadata_meta_symint
+
+- func: _print(str s) -> ()
+  dispatch:
+    CompositeExplicitAutograd: _print
+
+- func: sym_constrain_range(Scalar size, *, int? min=None, int? max=None) -> ()
+  dispatch:
+    CompositeExplicitAutograd: sym_constrain_range
+
+- func: sym_constrain_range_for_size(Scalar size, *, int? min=None, int? max=None) -> ()
+  dispatch:
+    CompositeExplicitAutograd: sym_constrain_range_for_size
+
+- func: _functional_sym_constrain_range(Scalar size, int? min, int? max, Tensor dep_token) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _functional_sym_constrain_range
+
+- func: _functional_sym_constrain_range_for_size(Scalar size, int? min, int? max, Tensor dep_token) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _functional_sym_constrain_range_for_size
+
+- func: _make_dep_token(*, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  dispatch:
+    CPU: _make_dep_token_cpu
+
+- func: refine_names(Tensor(a) self, Dimname[] names) -> Tensor(a)
+  variants: method
+
+- func: _use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
+  device_check: NoCheck  # Tensor arguments allowed to be on different devices, see also _cudnn_ctc_loss
+  dispatch:
+    CUDA: _use_cudnn_ctc_loss
+
+- func: _use_cudnn_ctc_loss.Tensor(Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, int blank) -> bool
+  device_check: NoCheck  # Tensor arguments allowed to be on different devices, see also _cudnn_ctc_loss
+  dispatch:
+    CUDA: _use_cudnn_ctc_loss_tensor
+
+- func: _cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank, bool deterministic, bool zero_infinity) -> (Tensor, Tensor)
+  device_check: NoCheck  # log_probs is expected to be on CUDA while targets is expected to be on CPU
+  dispatch:
+    CUDA: _cudnn_ctc_loss
+  autogen: _cudnn_ctc_loss.out
+
+- func: _cudnn_ctc_loss.Tensor(Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, int blank, bool deterministic, bool zero_infinity) -> (Tensor, Tensor)
+  device_check: NoCheck  # log_probs is expected to be on CUDA while targets is expected to be on CPU
+  dispatch:
+    CUDA: _cudnn_ctc_loss_tensor
+
+- func: _use_cudnn_rnn_flatten_weight() -> bool
+
+- func: _cudnn_rnn_flatten_weight(Tensor[] weight_arr, int weight_stride0, SymInt input_size, int mode, SymInt hidden_size, SymInt proj_size, int num_layers, bool batch_first, bool bidirectional) -> Tensor
+  dispatch:
+    CUDA: _cudnn_rnn_flatten_weight
+  autogen: _cudnn_rnn_flatten_weight.out
+
+- func: _cudnn_rnn(Tensor input, Tensor[] weight, int weight_stride0, Tensor? weight_buf, Tensor hx, Tensor? cx, int mode, SymInt hidden_size, SymInt proj_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, SymInt[] batch_sizes, Tensor? dropout_state) -> (Tensor, Tensor, Tensor, Tensor, Tensor)
+  # rnn_tanh may or may not redispatch to _cudnn_rnn based on algorithm and build. Thus it might hit dispatch or kernel device check.
+  # Disable dispatch time device check for consistent behavior.
+  device_check: NoCheck
+  dispatch:
+    CUDA: _cudnn_rnn
+  autogen: _cudnn_rnn.out
+  tags: nondeterministic_seeded
+
+- func: _cudnn_rnn_backward(Tensor input, Tensor[] weight, int weight_stride0, Tensor weight_buf, Tensor hx, Tensor? cx, Tensor output, Tensor? grad_output, Tensor? grad_hy, Tensor? grad_cy, int mode, SymInt hidden_size, SymInt proj_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, SymInt[] batch_sizes, Tensor? dropout_state, Tensor reserve, bool[4] output_mask) -> (Tensor, Tensor, Tensor, Tensor[])
+  dispatch:
+    CUDA: _cudnn_rnn_backward
+  autogen: _cudnn_rnn_backward.out
+
+- func: _cudnn_init_dropout_state(float dropout, bool train, int dropout_seed, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+  dispatch:
+    CUDA: _cudnn_init_dropout_state
+  autogen: _cudnn_init_dropout_state.out
+  tags: nondeterministic_seeded
+
+- func: _debug_has_internal_overlap(Tensor self) -> int
+  variants: function
+
+- func: _fused_dropout(Tensor self, float p, Generator? generator=None) -> (Tensor, Tensor)
+  variants: function
+  dispatch:
+    CUDA: fused_dropout_cuda
+  tags: nondeterministic_seeded
+  autogen: _fused_dropout.out
+
+- func: _masked_scale(Tensor self, Tensor mask, float scale) -> Tensor
+  variants: function
+  dispatch:
+    CUDA: masked_scale_cuda
+  autogen: _masked_scale.out
+
+- func: native_dropout(Tensor input, float p, bool? train) -> (Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU: native_dropout_cpu
+    CUDA: native_dropout_cuda
+    MPS: native_dropout_mps
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: native_dropout_nested
+  tags: [nondeterministic_seeded, core]
+  autogen: native_dropout.out
+
+- func: native_dropout_backward(Tensor grad_output, Tensor mask, float scale) -> Tensor
+  dispatch:
+    CPU, NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: native_dropout_backward
+    CUDA: native_dropout_backward_cuda
+    MPS: native_dropout_backward_mps
+  autogen: native_dropout_backward.out
+  tags: pointwise
+
+- func: _sobol_engine_draw(Tensor quasi, int n, Tensor sobolstate, int dimension, int num_generated, ScalarType? dtype) -> (Tensor, Tensor)
+
+- func: _sobol_engine_ff_(Tensor(a!) self, int n, Tensor sobolstate, int dimension, int num_generated) -> Tensor(a!)
+
+- func: _sobol_engine_scramble_(Tensor(a!) self, Tensor ltm, int dimension) -> Tensor(a!)
+
+- func: _sobol_engine_initialize_state_(Tensor(a!) self, int dimension) -> Tensor(a!)
+
+- func: _reshape_from_tensor(Tensor self, Tensor shape) -> Tensor
+
+- func: _shape_as_tensor(Tensor self) -> Tensor
+
+- func: dropout(Tensor input, float p, bool train) -> Tensor
+  tags: [nondeterministic_seeded, maybe_aliasing_or_mutating]
+
+- func: dropout_(Tensor(a!) self, float p, bool train) -> Tensor(a!)
+  tags: nondeterministic_seeded
+
+- func: feature_dropout(Tensor input, float p, bool train) -> Tensor
+  tags: [nondeterministic_seeded, maybe_aliasing_or_mutating]
+
+- func: feature_dropout_(Tensor(a!) self, float p, bool train) -> Tensor(a!)
+  tags: nondeterministic_seeded
+
+- func: alpha_dropout(Tensor input, float p, bool train) -> Tensor
+  tags: [nondeterministic_seeded, maybe_aliasing_or_mutating]
+
+- func: alpha_dropout_(Tensor(a!) self, float p, bool train) -> Tensor(a!)
+  tags: nondeterministic_seeded
+
+- func: feature_alpha_dropout(Tensor input, float p, bool train) -> Tensor
+  tags: [nondeterministic_seeded, maybe_aliasing_or_mutating]
+
+- func: feature_alpha_dropout_(Tensor(a!) self, float p, bool train) -> Tensor(a!)
+  tags: nondeterministic_seeded
+
+- func: abs(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: abs
+    SparseCPU, SparseCUDA, SparseMPS: abs_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: abs_sparse_csr
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_abs
+  tags: [core, pointwise]
+
+- func: abs_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: abs_
+    SparseCPU, SparseCUDA, SparseMPS: abs_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: abs_sparse_csr_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_abs_
+
+- func: abs.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MPS, MTIA: abs_out
+    SparseCPU, SparseCUDA, SparseMPS: abs_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: abs_sparse_csr_out
+  tags: pointwise
+
+# Note [Adding an alias]
+# To add an alias do the following:
+#
+# 1) Copy the original functions native_functions.yaml entry, but replace the
+#      original function's name with their own and delete any dispatch
+#      keys for the aliases. Specifying a dispatch key will prevent
+#      autograd from recording the operations the alias performs, which
+#      will stop it from "inheriting" the original operation's autograd behavior.
+# 2) Implement the corresponding functions and have them redispatch to the
+#      original function.
+# 3) Add docstrings to the new function that reference the original function,
+#      and document the method as usual (if it exists.)
+#    (See torch/_torch_docs.py and docs/source/torch.rst if adding a function,
+#     torch/_tensor_docs.py and docs/source/tensors.rst if adding a method,
+#     or module-specific doc bindings (like torch/linalg/__init__.py) if
+#     adding an alias in a namespace.)
+# 4) Update torch/overrides.py consistent with the original function.
+# 5) Update the alias_map in torch/csrc/jit/passes/normalize_ops.cpp.
+# 6) Add aliases argument to existing OpInfo/UnaryUfuncInfo or create new OpInfo/UnaryUfuncInfo entry
+# in op_db list in torch/testing/_internal/common_methods_invocations.py
+#
+# See torch.absolute, an alias for torch.abs, as an example.
+# Absolute, alias for abs
+
+- func: absolute(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+
+- func: absolute_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: absolute.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+
+- func: angle(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CPU, CUDA, MPS: angle
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: angle_sparse_csr
+  tags: pointwise
+
+- func: angle.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MPS: angle_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: angle_sparse_csr_out
+  tags: pointwise
+
+- func: view_as_real(Tensor(a) self) -> Tensor(a)
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS, Meta: view_as_real
+
+- func: view_as_complex(Tensor(a) self) -> Tensor(a)
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS, Meta: view_as_complex
+
+- func: sgn(Tensor self) -> Tensor
+  variants: function, method
+  structured_delegate: sgn.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sgn_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sgn_sparse_csr
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_sgn
+  tags: pointwise
+
+- func: sgn_(Tensor(a!) self) -> Tensor(a!)
+  variants: method
+  structured_delegate: sgn.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sgn_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sgn_sparse_csr_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_sgn_
+  tags: pointwise
+
+- func: sgn.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: sgn_out
+    MPS: sgn_out_mps
+    SparseCPU, SparseCUDA, SparseMPS: sgn_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sgn_sparse_csr_out
+  tags: pointwise
+
+- func: chalf(Tensor self, *, MemoryFormat? memory_format=None) -> Tensor
+  variants: method
+
+- func: real(Tensor(a) self) -> Tensor(a)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+
+- func: imag(Tensor(a) self) -> Tensor(a)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+
+- func: _conj(Tensor(a) self) -> Tensor(a)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: _conj
+
+- func: conj(Tensor(a) self) -> Tensor(a)
+  variants: function, method
+  manual_cpp_binding: True
+
+- func: _conj_physical(Tensor self) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: _conj_physical
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: conj_physical_sparse_csr
+  autogen: _conj_physical.out
+
+- func: conj_physical(Tensor self) -> Tensor
+  variants: function, method
+  tags: [pointwise, maybe_aliasing_or_mutating]
+
+- func: conj_physical.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: conj_physical_out
+    MPS: conj_physical_out_mps
+    SparseCPU, SparseCUDA, SparseMPS: conj_physical_out_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: conj_physical_sparse_csr_out
+  tags: pointwise
+
+- func: conj_physical_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: conj_physical_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: conj_physical_sparse_csr_
+  tags: pointwise
+
+- func: resolve_conj(Tensor(a) self) -> Tensor(a)
+  variants: function, method
+
+- func: resolve_neg(Tensor(a) self) -> Tensor(a)
+  variants: function, method
+
+- func: _neg_view(Tensor(a) self) -> Tensor(a)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: _neg_view
+
+- func: acos(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: acos.out
+  tags: [core, pointwise]
+
+- func: acos_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: acos.out
+  tags: pointwise
+
+- func: acos.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: acos_out
+  tags: pointwise
+
+# arccos, alias of acos
+- func: arccos(Tensor self) -> Tensor
+  variants: function, method
+
+- func: arccos_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+
+- func: arccos.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=0, bool ceil_mode=False, bool count_include_pad=True) -> Tensor
+  tags: core
+  autogen: avg_pool1d.out
+
+- func: adaptive_avg_pool1d(Tensor self, int[1] output_size) -> Tensor
+  tags: core
+  autogen: adaptive_avg_pool1d.out
+
+# Return: (Tensor output, Tensor indices)
+- func: adaptive_max_pool1d(Tensor self, int[1] output_size) -> (Tensor, Tensor)
+
+- func: add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: add.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: add_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: add_sparse_csr
+    MkldnnCPU: mkldnn_add
+    ZeroTensor: add_zerotensor
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_add_Tensor
+  tags: [core, pointwise]
+
+- func: add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: add.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: add_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: add_sparse_csr_
+    MkldnnCPU: mkldnn_add_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_add__Tensor
+  tags: pointwise
+
+- func: add.out(Tensor self, Tensor other, *, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  ufunc_inner_loop:
+    Generic: add (AllAndComplex, BFloat16, Half, ComplexHalf)
+    ScalarOnly: add (Bool)
+  dispatch:
+    SparseCPU, SparseMeta: add_out_sparse_cpu
+    SparseCUDA: add_out_sparse_cuda
+    SparseMPS: add_out_sparse_mps
+    SparseCsrCPU, SparseCsrMeta: add_out_sparse_compressed_cpu
+    SparseCsrCUDA: add_out_sparse_compressed_cuda
+    MkldnnCPU: mkldnn_add_out
+    MPS: add_out_mps
+    MTIA: add_out_mtia
+  tags: pointwise
+
+- func: _add_relu.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
+  variants: function
+  dispatch:
+    CPU: add_relu
+
+- func: _add_relu_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CPU: add_relu_
+
+- func: _add_relu.out(Tensor self, Tensor other, *, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CPU: add_relu_out
+
+- func: _add_relu.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor
+  variants: function
+  dispatch:
+    CPU: add_relu
+
+- func: _add_relu_.Scalar(Tensor(a!) self, Scalar other, Scalar alpha=1) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CPU: add_relu_
+  autogen: _add_relu.Scalar_out
+
+# For C++ only, until we have conversion from C++ numbers to Tensor
+- func: add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: add
+  tags: [core, pointwise]
+
+- func: add_.Scalar(Tensor(a!) self, Scalar other, Scalar alpha=1) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: add_
+  autogen: add.Scalar_out
+  tags: pointwise
+
+- func: addmv(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  structured_delegate: addmv.out
+  variants: function, method
+
+- func: addmv_(Tensor(a!) self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1) -> Tensor(a!)
+  structured_delegate: addmv.out
+  variants: function, method
+
+- func: addmv.out(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: addmv_out_cpu
+    CUDA: addmv_out_cuda
+    MPS: addmv_out_mps
+    XPU: addmv_out_xpu
+    SparseCsrCPU: addmv_out_sparse_compressed
+    SparseCsrCUDA: addmv_out_sparse_compressed_cuda
+
+- func: addr(Tensor self, Tensor vec1, Tensor vec2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU, CUDA: addr
+    MPS: addr_mps
+    CompositeExplicitAutograd: math_addr
+
+- func: addr_(Tensor(a!) self, Tensor vec1, Tensor vec2, *, Scalar beta=1, Scalar alpha=1) -> Tensor(a!)
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: addr_
+
+- func: addr.out(Tensor self, Tensor vec1, Tensor vec2, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: addr_out
+    MPS: addr_out_mps
+    CompositeExplicitAutograd: math_addr_out
+
+- func: affine_grid_generator(Tensor theta, SymInt[] size, bool align_corners) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: affine_grid_generator
+  autogen: affine_grid_generator.out
+
+- func: affine_grid_generator_backward(Tensor grad, SymInt[] size, bool align_corners) -> Tensor
+  variants: function
+
+- func: _is_all_true(Tensor self) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: _is_all_true
+
+- func: _is_any_true(Tensor self) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: _is_any_true
+
+# Note: this function is only for testing.
+- func: _test_check_tensor(Tensor self) -> Tensor
+  variants: function
+
+# Note; this function is only for testing
+- func: _test_functorch_fallback(Tensor self, Tensor other) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _test_functorch_fallback
+  autogen: _test_functorch_fallback.out
+
+- func: all.dim(Tensor self, int dim, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: all.out
+  variants: function, method
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_all
+  tags: reduction
+
+
+- func: all.dims(Tensor self, int[]? dim=None, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: all.dims_out
+  variants: function, method
+  cpp_no_default_args: ['dim']
+  dispatch:
+    CompositeExplicitAutograd: all_dims_default
+  tags: reduction
+
+- func: all.out(Tensor self, int dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  dispatch:
+    CPU, CUDA: all_out
+    MPS: all_out_mps
+    MTIA: all_out_mtia
+  tags: reduction
+
+- func: all.dims_out(Tensor self, int[]? dim=None, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  dispatch:
+    CPU, CUDA: all_dims_out
+    CompositeExplicitAutograd: all_dims_out_default
+  cpp_no_default_args: ['dim']
+  tags: reduction
+
+- func: all.dimname(Tensor self, Dimname dim, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: all.dimname_out(Tensor self, Dimname dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: reduction
+
+- func: allclose(Tensor self, Tensor other, float rtol=1e-05, float atol=1e-08, bool equal_nan=False) -> bool
+  variants: function, method
+  tags: data_dependent_output
+  dispatch:
+    CompositeExplicitAutograd: allclose
+
+- func: any.dim(Tensor self, int dim, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: any.out
+  variants: function, method
+  tags: [core, reduction]
+
+- func: any.dims(Tensor self, int[]? dim=None, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: any.dims_out
+  variants: function, method
+  cpp_no_default_args: ['dim']
+  tags: [core, reduction]
+  dispatch:
+    CompositeExplicitAutograd: any_dims_default
+
+- func: any.out(Tensor self, int dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  dispatch:
+    CPU, CUDA: any_out
+    MPS: any_out_mps
+  tags: reduction
+
+- func: any.dims_out(Tensor self, int[]? dim=None, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  dispatch:
+    CPU, CUDA: any_dims_out
+    CompositeExplicitAutograd: any_dims_out_default
+  cpp_no_default_args: ['dim']
+  tags: reduction
+
+- func: any.dimname(Tensor self, Dimname dim, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: any.dimname_out(Tensor self, Dimname dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: reduction
+
+- func: arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: arange
+
+- func: arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: arange
+
+# This operator should be named `arange.start_out` if following the naming convention. However that
+# name is already taken. Disabled because of CI job failures.
+# FIXME: enable this
+#- func: arange.start_out_(Scalar start, Scalar end, *, Tensor(a!) out) -> Tensor(a!)
+#  dispatch:
+#    CompositeExplicitAutograd: arange_start_out
+
+- func: arange.start_step(Scalar start, Scalar end, Scalar step=1, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: arange
+  cpp_no_default_args: ['step']
+  tags: core
+
+- func: arange.out(Scalar end, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: arange_out
+
+- func: arange.start_out(Scalar start, Scalar end, Scalar step=1, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, Meta: arange_out
+    CUDA: arange_cuda_out
+    MPS: arange_mps_out
+    MTIA: arange_mtia_out
+  cpp_no_default_args: ['step']
+
+# This function is a temporary hack to allow tracing of arange like constructs with dynamic
+# bounds on arange.  Normal arange is not traceable because it does not take any tensor inputs;
+# if the range you need is based on another tensor, calling this function directly will
+# preserve tracing.  Get rid of this when arange can directly take tensors for bounds
+# (so that it can be traced directly).
+- func: _dim_arange(Tensor like, int dim) -> Tensor
+
+- func: argmax(Tensor self, int? dim=None, bool keepdim=False) -> Tensor
+  structured_delegate: argmax.out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: [core, reduction]
+
+- func: argmax.out(Tensor self, int? dim=None, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU, CUDA: argmax_out
+    MPS: argmax_out_mps
+  tags: reduction
+
+- func: argmin(Tensor self, int? dim=None, bool keepdim=False) -> Tensor
+  structured_delegate: argmin.out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: [core, reduction]
+
+- func: argmin.out(Tensor self, int? dim=None, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU, CUDA: argmin_out
+    MPS: argmin_out_mps
+  tags: reduction
+
+- func: acosh(Tensor self) -> Tensor
+  variants: function, method
+  structured_delegate: acosh.out
+  tags: [core, pointwise]
+
+- func: acosh_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+  structured_delegate: acosh.out
+  tags: pointwise
+
+- func: acosh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: acosh_out
+    MPS: acosh_out_mps
+  tags: pointwise
+# arccosh, alias for acosh
+
+- func: arccosh(Tensor self) -> Tensor
+  variants: function, method
+
+- func: arccosh_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+
+- func: arccosh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: asinh(Tensor self) -> Tensor
+  variants: function, method
+  structured_delegate: asinh.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: asinh_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: asinh_sparse_csr
+  tags: [core, pointwise]
+
+- func: asinh_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+  structured_delegate: asinh.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: asinh_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: asinh_sparse_csr_
+  tags: pointwise
+
+- func: asinh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: asinh_out
+    MPS: asinh_out_mps
+    SparseCPU, SparseCUDA, SparseMPS: asinh_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: asinh_sparse_csr_out
+  tags: pointwise
+
+# arcsinh, alias for asinh
+- func: arcsinh(Tensor self) -> Tensor
+  variants: function, method
+
+- func: arcsinh_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+
+- func: arcsinh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: atanh(Tensor self) -> Tensor
+  structured_delegate: atanh.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: atanh_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: atanh_sparse_csr
+  tags: [core, pointwise]
+
+- func: atanh_(Tensor(a!) self) -> Tensor(a!)
+  structured_delegate: atanh.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: atanh_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: atanh_sparse_csr_
+  tags: pointwise
+
+- func: atanh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: atanh_out
+    MPS: atanh_out_mps
+    SparseCPU, SparseCUDA, SparseMPS: atanh_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: atanh_sparse_csr_out
+  tags: pointwise
+# arctanh, alias for atanh
+
+- func: arctanh(Tensor self) -> Tensor
+  variants: function, method
+
+- func: arctanh_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+
+- func: arctanh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: as_strided(Tensor(a) self, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor(a)
+  variants: function, method
+  dispatch:
+    ZeroTensor, CPU, CUDA, MTIA, MPS: as_strided_tensorimpl
+    Meta: as_strided_tensorimpl_meta_symint
+    QuantizedCPU, QuantizedCUDA: as_strided_qtensorimpl
+  device_check: NoCheck
+  device_guard: False
+  tags: core
+
+- func: as_strided_(Tensor(a!) self, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor(a!)
+  use_const_ref_for_mutable_tensors: True
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  tags: inplace_view
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: as_strided__symint
+
+- func: asin(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: asin.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: asin_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: asin_sparse_csr
+  tags: [core, pointwise]
+
+- func: asin_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: asin.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: asin_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: asin_sparse_csr_
+  tags: pointwise
+
+- func: asin.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: asin_out
+    SparseCPU, SparseCUDA, SparseMPS: asin_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: asin_sparse_csr_out
+  tags: pointwise
+
+# arcsin, alias of asin
+- func: arcsin(Tensor self) -> Tensor
+  variants: function, method
+
+- func: arcsin_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+
+- func: arcsin.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: atan(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: atan.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: atan_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: atan_sparse_csr
+  tags: [core, pointwise]
+
+- func: atan_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: atan.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: atan_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: atan_sparse_csr_
+  tags: pointwise
+
+- func: atan.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: atan_out
+    SparseCPU, SparseCUDA, SparseMPS: atan_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: atan_sparse_csr_out
+  tags: pointwise
+
+# arctan, alias of atan
+- func: arctan(Tensor self) -> Tensor
+  variants: function, method
+
+- func: arctan_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+
+- func: arctan.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: atleast_1d(Tensor self) -> Tensor
+  variants: function
+  tags: maybe_aliasing_or_mutating
+
+- func: atleast_1d.Sequence(Tensor[] tensors) -> Tensor[]
+
+- func: atleast_2d(Tensor self) -> Tensor
+  variants: function
+  tags: maybe_aliasing_or_mutating
+
+- func: atleast_2d.Sequence(Tensor[] tensors) -> Tensor[]
+  variants: function
+
+- func: atleast_3d(Tensor self) -> Tensor
+  variants: function
+  tags: maybe_aliasing_or_mutating
+
+- func: atleast_3d.Sequence(Tensor[] tensors) -> Tensor[]
+  variants: function
+
+- func: baddbmm(Tensor self, Tensor batch1, Tensor batch2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  variants: function, method
+  structured_delegate: baddbmm.out
+
+- func: baddbmm_(Tensor(a!) self, Tensor batch1, Tensor batch2, *, Scalar beta=1, Scalar alpha=1) -> Tensor(a!)
+  variants: method
+  structured_delegate: baddbmm.out
+
+- func: baddbmm.out(Tensor self, Tensor batch1, Tensor batch2, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  dispatch:
+    CPU: baddbmm_out_cpu
+    CUDA: baddbmm_out_cuda
+    MPS: baddbmm_out_mps
+    XPU: baddbmm_out_xpu
+    MTIA: baddbmm_out_mtia
+    SparseCsrCUDA: baddbmm_out_sparse_csr_cuda
+
+- func: baddbmm.dtype(Tensor self, Tensor batch1, Tensor batch2, ScalarType out_dtype, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  variants: function
+  dispatch:
+    CUDA: _baddbmm_dtype_cuda
+
+- func: baddbmm.dtype_out(Tensor self, Tensor batch1, Tensor batch2, ScalarType out_dtype, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CUDA: _baddbmm_out_dtype_cuda
+
+- func: bartlett_window(int window_length, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: bartlett_window
+  autogen: bartlett_window.out
+
+- func: bartlett_window.periodic(int window_length, bool periodic, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: bartlett_window
+  autogen: bartlett_window.periodic_out
+
+- func: batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps, bool cudnn_enabled) -> Tensor
+  tags: maybe_aliasing_or_mutating
+
+- func: quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor
+  dispatch:
+    QuantizedCPU: quantized_batch_norm
+  autogen: quantized_batch_norm.out
+
+- func: _batch_norm_impl_index(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor, Tensor, Tensor, Tensor, int)
+  tags: maybe_aliasing_or_mutating
+
+- func: _batch_norm_impl_index_backward(int impl_index, Tensor input, Tensor grad_output, Tensor? weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_var_transform, bool train, float eps, bool[3] output_mask, Tensor reservedSpace) -> (Tensor, Tensor, Tensor)
+
+# Sample bernoulli with values in `self` as probability.
+- func: bernoulli(Tensor self, *, Generator? generator=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: bernoulli
+  tags: nondeterministic_seeded
+
+- func: bernoulli.out(Tensor self, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  tags: nondeterministic_seeded
+  dispatch:
+    CPU, CUDA: bernoulli_out
+    MPS: bernoulli_out_mps
+
+- func: bernoulli_.Tensor(Tensor(a!) self, Tensor p, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  tags: nondeterministic_seeded
+  dispatch:
+    CPU, CUDA: bernoulli_
+    MPS: bernoulli_mps_
+  autogen: bernoulli.Tensor, bernoulli.Tensor_out
+
+- func: bernoulli_.float(Tensor(a!) self, float p=0.5, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  tags: nondeterministic_seeded
+  dispatch:
+    CPU, CUDA: bernoulli_
+    MPS: bernoulli_mps_
+  autogen: bernoulli.float_out
+
+# Note [bernoulli.p schema]
+# We should probably just fix the overload ambiguity by appending a _functional to the C++ API name (BC breaking)
+# This out-of-place version isn't used explicitly, but needed by jit.
+# There is no default valid on `p` here because it would introduce ambiguity
+# with `bernoulli(Tensor self, *, Generator? generator=None)` declaration.
+- func: bernoulli.p(Tensor self, float p, *, Generator? generator=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: bernoulli
+
+- func: bilinear(Tensor input1, Tensor input2, Tensor weight, Tensor? bias=None) -> Tensor
+
+- func: binary_cross_entropy(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  variants: function
+  dispatch:
+    CPU: binary_cross_entropy_cpu
+    CUDA: binary_cross_entropy_cuda
+    MPS: binary_cross_entropy_mps
+
+- func: binary_cross_entropy.out(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  variants: function
+  dispatch:
+    CPU: binary_cross_entropy_out_cpu
+    CUDA: binary_cross_entropy_out_cuda
+    MPS: binary_cross_entropy_out_mps
+
+- func: binary_cross_entropy_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean) -> Tensor
+  python_module: nn
+  variants: function
+  dispatch:
+    CPU: binary_cross_entropy_backward_cpu
+    CUDA: binary_cross_entropy_backward_cuda
+    MPS: binary_cross_entropy_backward_mps
+
+- func: binary_cross_entropy_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  variants: function
+  dispatch:
+    CPU: binary_cross_entropy_backward_out_cpu
+    CUDA: binary_cross_entropy_backward_out_cuda
+    MPS: binary_cross_entropy_backward_out_mps
+
+- func: binary_cross_entropy_with_logits(Tensor self, Tensor target, Tensor? weight=None, Tensor? pos_weight=None, int reduction=Mean) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: binary_cross_entropy_with_logits
+  autogen: binary_cross_entropy_with_logits.out
+
+- func: bincount(Tensor self, Tensor? weights=None, SymInt minlength=0) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU: _bincount_cpu
+    CUDA: _bincount_cuda
+    MPS: _bincount_mps
+  tags: dynamic_output_shape
+  autogen: bincount.out
+
+- func: bitwise_not(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: bitwise_not.out
+  variants: function, method
+  tags: [core, pointwise]
+
+- func: bitwise_not_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: bitwise_not.out
+  variants: method
+  tags: pointwise
+
+- func: bitwise_not.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: bitwise_not_out
+  tags: pointwise
+
+- func: copysign.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: copysign_out
+  tags: pointwise
+
+- func: copysign.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: copysign.out
+  tags: pointwise
+
+- func: copysign_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: copysign.out
+
+- func: copysign.Scalar(Tensor self, Scalar other) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: copysign
+  tags: pointwise
+
+- func: copysign_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: copysign_
+
+- func: copysign.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: copysign_out
+  tags: pointwise
+
+- func: _lazy_clone(Tensor self) -> Tensor
+  # Like clone, but the copy takes place lazily, only if either the
+  # input or the output are written.
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: _lazy_clone
+
+- func: logical_not(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: logical_not
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_logical_not
+  tags: [core, pointwise]
+
+- func: logical_not_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: logical_not_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_logical_not_
+  tags: pointwise
+
+- func: logical_not.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: logical_not_out
+    MPS: logical_not_out_mps
+  tags: pointwise
+
+- func: logical_xor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: logical_xor
+  tags: [core, pointwise]
+
+- func: logical_xor_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: logical_xor_
+  tags: pointwise
+
+- func: logical_xor.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: logical_xor_out
+    MPS: logical_xor_out_mps
+  tags: pointwise
+
+- func: logical_and(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: logical_and
+  tags: [core, pointwise]
+
+- func: logical_and_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: logical_and_
+  tags: pointwise
+
+- func: logical_and.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: logical_and_out
+    MPS: logical_and_out_mps
+  tags: pointwise
+
+- func: logical_or(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: logical_or
+  tags: [core, pointwise]
+
+- func: logical_or_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: logical_or_
+  tags: pointwise
+
+- func: logical_or.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: logical_or_out
+    MPS: logical_or_out_mps
+  tags: pointwise
+
+- func: blackman_window(int window_length, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: blackman_window
+  autogen: blackman_window.out
+
+- func: blackman_window.periodic(int window_length, bool periodic, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: blackman_window
+  autogen: blackman_window.periodic_out
+
+- func: bmm(Tensor self, Tensor mat2) -> Tensor
+  structured_delegate: bmm.out
+  variants: function, method
+  dispatch:
+    SparseCPU: bmm_sparse_cpu
+    SparseCUDA: bmm_sparse_cuda
+    SparseMPS: bmm_sparse_mps
+    NestedTensorCPU: bmm_nested
+    NestedTensorCUDA: bmm_nested_cuda
+  tags: core
+
+- func: bmm.out(Tensor self, Tensor mat2, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  dispatch:
+    CPU: bmm_out_cpu
+    CUDA: bmm_out_cuda
+    MPS: bmm_out_mps
+    XPU: bmm_out_xpu
+    MTIA: bmm_out_mtia
+    SparseCPU: bmm_out_sparse_cpu
+    SparseCUDA: bmm_out_sparse_cuda
+    SparseMPS: bmm_out_sparse_mps
+    SparseCsrCUDA: bmm_out_sparse_csr_cuda
+
+- func: bmm.dtype(Tensor self, Tensor mat2, ScalarType out_dtype) -> Tensor
+  variants: function
+  dispatch:
+    CUDA: _bmm_dtype_cuda
+
+- func: bmm.dtype_out(Tensor self, Tensor mat2, ScalarType out_dtype, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CUDA: _bmm_out_dtype_cuda
+
+- func: broadcast_tensors(Tensor[] tensors) -> Tensor[]
+  device_check: NoCheck
+  device_guard: False
+
+- func: broadcast_to(Tensor(a) self, SymInt[] size) -> Tensor(a)
+  variants: function, method
+  dispatch:
+    CompositeImplicitAutograd: broadcast_to_symint
+
+- func: _sparse_broadcast_to(Tensor(a) self, int[] size) -> Tensor(a)
+  variants: function
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sparse_broadcast_to
+
+- func: cat(Tensor[] tensors, int dim=0) -> Tensor
+  structured_delegate: cat.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: cat_sparse
+    QuantizedCPU: cat_quantized_cpu
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: cat_nested
+  tags: core
+
+- func: cat.out(Tensor[] tensors, int dim=0, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  precomputed:
+  - dim -> int dim, int valid, bool all_contiguous, bool all_same_dtype, bool all_same_sizes_and_stride, MemoryFormat memory_format
+  dispatch:
+    CPU: cat_out_cpu
+    CUDA: cat_out_cuda
+    MPS: cat_out_mps
+    QuantizedCPU: cat_out_quantized_cpu
+
+- func: cat.names(Tensor[] tensors, Dimname dim) -> Tensor
+
+- func: cat.names_out(Tensor[] tensors, Dimname dim, *, Tensor(a!) out) -> Tensor(a!)
+
+# alias for torch.cat
+- func: concat(Tensor[] tensors, int dim=0) -> Tensor
+
+- func: concat.out(Tensor[] tensors, int dim=0, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: concat.names(Tensor[] tensors, Dimname dim) -> Tensor
+
+- func: concat.names_out(Tensor[] tensors, Dimname dim, *, Tensor(a!) out) -> Tensor(a!)
+
+# alias for torch.cat
+- func: concatenate(Tensor[] tensors, int dim=0) -> Tensor
+
+- func: concatenate.out(Tensor[] tensors, int dim=0, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: concatenate.names(Tensor[] tensors, Dimname dim) -> Tensor
+
+- func: concatenate.names_out(Tensor[] tensors, Dimname dim, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: block_diag(Tensor[] tensors) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: block_diag
+  autogen: block_diag.out
+
+- func: ceil(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: ceil.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: ceil_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: ceil_sparse_csr
+  tags: [core, pointwise]
+
+- func: ceil_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: ceil.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: ceil_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: ceil_sparse_csr_
+  tags: pointwise
+
+- func: ceil.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: ceil_out
+    SparseCPU, SparseCUDA, SparseMPS: ceil_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: ceil_sparse_csr_out
+  tags: pointwise
+
+# alias for torch.linalg.multi_dot
+- func: chain_matmul(Tensor[] matrices) -> Tensor
+  variants: function
+
+# alias for torch.linalg.multi_dot
+- func: chain_matmul.out(Tensor[] matrices, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: unsafe_chunk(Tensor self, int chunks, int dim=0) -> Tensor[]
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  tags: maybe_aliasing_or_mutating
+
+- func: chunk(Tensor(a -> *) self, int chunks, int dim=0) -> Tensor(a)[]
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: chunk
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: chunk_nested_tensor
+
+- func: tensor_split.sections(Tensor(a -> *) self, SymInt sections, int dim=0) -> Tensor(a)[]
+  variants: function, method
+  dispatch:
+    CompositeImplicitAutograd: tensor_split_sections_symint
+
+- func: tensor_split.indices(Tensor(a -> *) self, SymInt[] indices, int dim=0) -> Tensor(a)[]
+  variants: function, method
+  dispatch:
+    CompositeImplicitAutograd: tensor_split_indices_symint
+
+- func: tensor_split.tensor_indices_or_sections(Tensor(a -> *) self, Tensor tensor_indices_or_sections, int dim=0) -> Tensor(a)[]
+  variants: function, method
+
+- func: clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  cpp_no_default_args: ['min']
+  structured_delegate: clamp.out
+  dispatch:
+    QuantizedCPU: clamp_quantized_cpu
+  tags: [core, pointwise]
+
+- func: clamp.Tensor(Tensor self, Tensor? min=None, Tensor? max=None) -> Tensor
+  variants: function, method
+  structured_delegate: clamp.Tensor_out
+  tags: [core, pointwise]
+
+- func: clamp_(Tensor(a!) self, Scalar? min=None, Scalar? max=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  cpp_no_default_args: ['min']
+  structured_delegate: clamp.out
+  tags: pointwise
+
+- func: clamp_.Tensor(Tensor(a!) self, Tensor? min=None, Tensor? max=None) -> Tensor(a!)
+  variants: function, method
+  structured_delegate: clamp.Tensor_out
+  tags: pointwise
+
+- func: clamp.out(Tensor self, Scalar? min=None, Scalar? max=None, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  cpp_no_default_args: ['min']
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MTIA, MPS: clamp_out
+  tags: pointwise
+
+- func: clamp.Tensor_out(Tensor self, Tensor? min=None, Tensor? max=None, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: clamp_Tensor_out
+  tags: pointwise
+
+- func: clamp_max(Tensor self, Scalar max) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: clamp_max.out
+  tags: pointwise
+
+- func: clamp_max.Tensor(Tensor self, Tensor max) -> Tensor
+  variants: function, method
+  structured_delegate: clamp_max.Tensor_out
+  tags: pointwise
+
+- func: clamp_max_(Tensor(a!) self, Scalar max) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: clamp_max.out
+  tags: pointwise
+
+- func: clamp_max_.Tensor(Tensor(a!) self, Tensor max) -> Tensor(a!)
+  variants: function, method
+  structured_delegate: clamp_max.Tensor_out
+  tags: pointwise
+
+- func: clamp_max.out(Tensor self, Scalar max, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MTIA, MPS: clamp_max_out
+  tags: pointwise
+
+- func: clamp_max.Tensor_out(Tensor self, Tensor max, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: clamp_max_Tensor_out
+  tags: pointwise
+
+- func: clamp_min(Tensor self, Scalar min) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: clamp_min.out
+  tags: pointwise
+
+- func: clamp_min.Tensor(Tensor self, Tensor min) -> Tensor
+  variants: function, method
+  structured_delegate: clamp_min.Tensor_out
+  tags: pointwise
+
+- func: clamp_min_(Tensor(a!) self, Scalar min) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: clamp_min.out
+  tags: pointwise
+
+- func: clamp_min_.Tensor(Tensor(a!) self, Tensor min) -> Tensor(a!)
+  variants: function, method
+  structured_delegate: clamp_min.Tensor_out
+  tags: pointwise
+
+- func: clamp_min.out(Tensor self, Scalar min, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MTIA, MPS: clamp_min_out
+  tags: pointwise
+
+- func: clamp_min.Tensor_out(Tensor self, Tensor min, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: clamp_min_Tensor_out
+  tags: pointwise
+
+# clip is an alias for clamp
+- func: clip(Tensor self, Scalar? min=None, Scalar? max=None) -> Tensor
+  cpp_no_default_args: ['min']
+  variants: function, method
+  tags: pointwise
+
+- func: clip.Tensor(Tensor self, Tensor? min=None, Tensor? max=None) -> Tensor
+  variants: function, method
+  tags: pointwise
+
+- func: clip_(Tensor(a!) self, Scalar? min=None, Scalar? max=None) -> Tensor(a!)
+  cpp_no_default_args: ['min']
+  variants: function, method
+  tags: pointwise
+
+- func: clip_.Tensor(Tensor(a!) self, Tensor? min=None, Tensor? max=None) -> Tensor(a!)
+  variants: function, method
+  tags: pointwise
+
+- func: clip.out(Tensor self, Scalar? min=None, Scalar? max=None, *, Tensor(a!) out) -> Tensor(a!)
+  cpp_no_default_args: ['min']
+  tags: pointwise
+
+- func: clip.Tensor_out(Tensor self, Tensor? min=None, Tensor? max=None, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: cudnn_is_acceptable(Tensor self) -> bool
+  device_check: NoCheck
+  device_guard: False
+
+- func: complex(Tensor real, Tensor imag) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: complex
+
+- func: complex.out(Tensor real, Tensor imag, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: complex_out
+
+- func: polar(Tensor abs, Tensor angle) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: polar
+
+- func: polar.out(Tensor abs, Tensor angle, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: polar_out
+
+- func: constant_pad_nd(Tensor self, SymInt[] pad, Scalar value=0) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: constant_pad_nd
+    MPS: constant_pad_nd_mps
+  autogen: constant_pad_nd.out
+  tags: core
+
+- func: contiguous(Tensor(a) self, *, MemoryFormat memory_format=contiguous_format) -> Tensor(a)
+  variants: method
+  manual_cpp_binding: True
+
+- func: convolution(Tensor input, Tensor weight, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: convolution
+  autogen: convolution.out
+  tags: core
+
+- func: convolution_backward(Tensor grad_output, Tensor input, Tensor weight, SymInt[]? bias_sizes, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CompositeExplicitAutograd, CUDA: convolution_backward
+  autogen: convolution_backward.out
+  tags: core
+
+- func: convolution_overrideable(Tensor input, Tensor weight, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: convolution_overrideable
+  autogen: convolution_overrideable.out
+
+- func: convolution_backward_overrideable(Tensor grad_output, Tensor input, Tensor weight, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups, bool[3] output_mask) -> (Tensor grad_input, Tensor grad_weight, Tensor grad_bias)
+  dispatch:
+    CompositeExplicitAutograd: convolution_backward_overrideable
+  autogen: convolution_backward_overrideable.out
+
+- func: _convolution(Tensor input, Tensor weight, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _convolution
+  autogen: _convolution.out
+
+- func: _convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, int[] output_padding, SymInt groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> Tensor
+
+- func: _convolution_mode(Tensor input, Tensor weight, Tensor? bias, SymInt[] stride, str padding, SymInt[] dilation, SymInt groups) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: _convolution_mode_symint
+
+- func: _convolution_double_backward(Tensor? ggI, Tensor? ggW, Tensor? ggb, Tensor gO, Tensor weight, Tensor self, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+
+- func: conv1d(Tensor input, Tensor weight, Tensor? bias=None, SymInt[1] stride=1, SymInt[1] padding=0, SymInt[1] dilation=1, SymInt groups=1) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: conv1d_symint
+
+- func: conv2d(Tensor input, Tensor weight, Tensor? bias=None, SymInt[2] stride=1, SymInt[2] padding=0, SymInt[2] dilation=1, SymInt groups=1) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: conv2d_symint
+
+- func: conv3d(Tensor input, Tensor weight, Tensor? bias=None, SymInt[3] stride=1, SymInt[3] padding=0, SymInt[3] dilation=1, SymInt groups=1) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: conv3d_symint
+
+- func: conv1d.padding(Tensor input, Tensor weight, Tensor? bias=None, SymInt[1] stride=1, str padding="valid", SymInt[1] dilation=1, SymInt groups=1) -> Tensor
+  cpp_no_default_args: ['bias', 'stride', 'padding']
+  dispatch:
+    CompositeImplicitAutograd: conv1d_padding_symint
+
+- func: conv2d.padding(Tensor input, Tensor weight, Tensor? bias=None, SymInt[2] stride=1, str padding="valid", SymInt[2] dilation=1, SymInt groups=1) -> Tensor
+  cpp_no_default_args: ['bias', 'stride', 'padding']
+  dispatch:
+    CompositeImplicitAutograd: conv2d_padding_symint
+
+- func: conv3d.padding(Tensor input, Tensor weight, Tensor? bias=None, SymInt[3] stride=1, str padding="valid", SymInt[3] dilation=1, SymInt groups=1) -> Tensor
+  cpp_no_default_args: ['bias', 'stride', 'padding']
+  dispatch:
+    CompositeImplicitAutograd: conv3d_padding_symint
+
+- func: conv_tbc(Tensor self, Tensor weight, Tensor bias, int pad=0) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: conv_tbc
+  autogen: conv_tbc.out
+
+- func: conv_tbc_backward(Tensor self, Tensor input, Tensor weight, Tensor bias, int pad) -> (Tensor, Tensor, Tensor)
+
+# NB: we inherit the goofy argument order from PyTorch torch.nn.functional
+- func: conv_transpose1d(Tensor input, Tensor weight, Tensor? bias=None, SymInt[1] stride=1, SymInt[1] padding=0, SymInt[1] output_padding=0, SymInt groups=1, SymInt[1] dilation=1) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: conv_transpose1d_symint
+
+- func: conv_transpose2d.input(Tensor input, Tensor weight, Tensor? bias=None, SymInt[2] stride=1, SymInt[2] padding=0, SymInt[2] output_padding=0, SymInt groups=1, SymInt[2] dilation=1) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: conv_transpose2d_symint
+
+- func: conv_transpose3d.input(Tensor input, Tensor weight, Tensor? bias=None, SymInt[3] stride=1, SymInt[3] padding=0, SymInt[3] output_padding=0, SymInt groups=1, SymInt[3] dilation=1) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: conv_transpose3d_symint
+
+- func: copy(Tensor self, Tensor src, bool non_blocking=False) -> Tensor
+  variants: function
+  dispatch:
+    Meta: copy_meta
+    CompositeExplicitAutogradNonFunctional: copy
+  tags: core
+
+- func: copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    MkldnnCPU: copy_mkldnn_
+    SparseCPU, SparseCUDA, SparseMPS: copy_sparse_wrapper_
+    CompositeExplicitAutograd: copy_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: copy_sparse_compressed_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: copy_nested_
+  autogen: copy.out
+
+- func: _copy_from(Tensor self, Tensor dst, bool non_blocking=False) -> Tensor
+  dispatch:
+    MPS: _copy_from_mps
+  autogen: _copy_from.out
+
+# We need this to be able to properly copy from a CPU to an XLA tensor with different sizes.
+# See https://github.com/pytorch/xla/issues/2881
+- func: _copy_from_and_resize(Tensor self, Tensor dst) -> Tensor
+  dispatch:
+    MPS: _copy_from_and_resize_mps
+  autogen: _copy_from_and_resize.out
+
+- func: cos(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: cos.out
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_cos
+  tags: [core, pointwise]
+
+- func: cos_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: cos.out
+  tags: pointwise
+
+- func: cos.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: cos_out
+  tags: pointwise
+
+- func: cosh(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: cosh.out
+  tags: [core, pointwise]
+
+- func: cosh_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: cosh.out
+  tags: pointwise
+
+- func: cosh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: cosh_out
+  tags: pointwise
+
+- func: cosine_embedding_loss(Tensor input1, Tensor input2, Tensor target, float margin=0.0, int reduction=Mean) -> Tensor
+
+- func: count_nonzero.dim_IntList(Tensor self, int[] dim) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU: count_nonzero_cpu
+    CUDA: count_nonzero_cuda
+    MPS: count_nonzero_mps
+  autogen: count_nonzero.dim_IntList_out
+  tags: reduction
+
+- func: count_nonzero(Tensor self, int? dim=None) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: count_nonzero
+  autogen: count_nonzero.out
+  tags: reduction
+
+- func: cov(Tensor self, *, int correction=1, Tensor? fweights=None, Tensor? aweights=None) -> Tensor
+  variants: function, method
+
+- func: corrcoef(Tensor self) -> Tensor
+  variants: function, method
+
+- func: cudnn_affine_grid_generator(Tensor theta, int N, int C, int H, int W) -> Tensor grid
+  dispatch:
+    CUDA: cudnn_affine_grid_generator_forward
+  autogen: cudnn_affine_grid_generator.out
+
+# TODO: Why do I have to call this grad?!
+- func: cudnn_affine_grid_generator_backward(Tensor grad, int N, int C, int H, int W) -> Tensor grad_theta
+  dispatch:
+    CUDA: cudnn_affine_grid_generator_backward
+  autogen: cudnn_affine_grid_generator_backward.out
+
+- func: cudnn_batch_norm(Tensor input, Tensor weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float exponential_average_factor, float epsilon) -> (Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: cudnn_batch_norm
+
+- func: cudnn_batch_norm.out(Tensor input, Tensor weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float exponential_average_factor, float epsilon, *, Tensor(a!) out0, Tensor(b!) out1, Tensor(c!) out2, Tensor(d!) out3) -> (Tensor(a!), Tensor(b!), Tensor(c!), Tensor(d!))
+  dispatch:
+    CUDA: cudnn_batch_norm_out
+
+# NB: You can only use this if you used cudnn_batch_norm training=True
+- func: cudnn_batch_norm_backward(Tensor input, Tensor grad_output, Tensor weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_var, float epsilon, Tensor reserveSpace) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: cudnn_batch_norm_backward
+  autogen: cudnn_batch_norm_backward.out
+
+- func: cudnn_convolution(Tensor self, Tensor weight, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic, bool allow_tf32) -> Tensor
+  dispatch:
+    CUDA: cudnn_convolution
+
+- func: cudnn_convolution.out(Tensor self, Tensor weight, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic, bool allow_tf32, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CUDA: cudnn_convolution_out
+
+- func: cudnn_convolution_transpose(Tensor self, Tensor weight, SymInt[] padding, SymInt[] output_padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic, bool allow_tf32) -> Tensor
+  dispatch:
+    CUDA: cudnn_convolution_transpose
+  autogen: cudnn_convolution_transpose.out
+
+- func: _mps_convolution_transpose(Tensor self, Tensor weight, SymInt[] padding, SymInt[] output_padding, SymInt[] stride, SymInt[] dilation, SymInt groups) -> Tensor
+  dispatch:
+    MPS: _mps_convolution_transpose
+  autogen: _mps_convolution_transpose.out
+
+- func: mps_convolution_transpose_backward(Tensor self, Tensor grad_output, Tensor weight, SymInt[] padding, SymInt[] output_padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool[2] output_mask) -> (Tensor, Tensor)
+  dispatch:
+    MPS: mps_convolution_transpose_backward
+  autogen: mps_convolution_transpose_backward.out
+
+- func: cudnn_convolution_relu(Tensor self, Tensor weight, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, SymInt groups) -> Tensor
+  dispatch:
+    CUDA: cudnn_convolution_relu
+  autogen: cudnn_convolution_relu.out
+
+- func: cudnn_convolution_add_relu(Tensor self, Tensor weight, Tensor z, Scalar? alpha, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, SymInt groups) -> Tensor
+  dispatch:
+    CUDA: cudnn_convolution_add_relu
+  autogen: cudnn_convolution_add_relu.out
+
+# NB: input is special cased in a way I don't quite understand
+- func: cudnn_grid_sampler(Tensor self, Tensor grid) -> Tensor output
+  dispatch:
+    CUDA: cudnn_grid_sampler_forward
+  autogen: cudnn_grid_sampler.out
+
+- func: cudnn_grid_sampler_backward(Tensor self, Tensor grid, Tensor grad_output) -> (Tensor grad_self, Tensor grad_grid)
+  dispatch:
+    CUDA: cudnn_grid_sampler_backward
+  autogen: cudnn_grid_sampler_backward.out
+
+- func: cummax(Tensor self, int dim) -> (Tensor values, Tensor indices)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: cummax
+
+- func: cummax.out(Tensor self, int dim, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CompositeExplicitAutograd: cummax_out
+
+- func: cummax.dimname(Tensor self, Dimname dim) -> (Tensor values, Tensor indices)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+
+- func: cummax.dimname_out(Tensor self, Dimname dim, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  device_check: NoCheck   # TensorIterator
+
+- func: _cummax_helper(Tensor self, Tensor(a!) values, Tensor(b!) indices, int dim) -> ()
+  variants: function
+  dispatch:
+    CPU: cummax_helper_cpu
+    CUDA: cummax_helper_cuda
+    MPS: cummax_helper_mps
+
+- func: cummin(Tensor self, int dim) -> (Tensor values, Tensor indices)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: cummin
+
+- func: cummin.out(Tensor self, int dim, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CompositeExplicitAutograd: cummin_out
+
+- func: cummin.dimname(Tensor self, Dimname dim) -> (Tensor values, Tensor indices)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+
+- func: cummin.dimname_out(Tensor self, Dimname dim, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  device_check: NoCheck   # TensorIterator
+
+- func: _cummin_helper(Tensor self, Tensor(a!) values, Tensor(b!) indices, int dim) -> ()
+  variants: function
+  dispatch:
+    CPU: cummin_helper_cpu
+    CUDA: cummin_helper_cuda
+    MPS: cummin_helper_mps
+
+- func: cummaxmin_backward(Tensor grad, Tensor input, Tensor indices, int dim) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+
+- func: cumprod(Tensor self, int dim, *, ScalarType? dtype=None) -> Tensor
+  structured_delegate: cumprod.out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+
+- func: cumprod_(Tensor(a!) self, int dim, *, ScalarType? dtype=None) -> Tensor(a!)
+  structured_delegate: cumprod.out
+  variants: method
+
+- func: cumprod.out(Tensor self, int dim, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: cumprod_out
+    MPS: cumprod_out_mps
+
+- func: cumprod.dimname(Tensor self, Dimname dim, *, ScalarType? dtype=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+
+- func: cumprod_.dimname(Tensor(a!) self, Dimname dim, *, ScalarType? dtype=None) -> Tensor(a!)
+  variants: method
+
+- func: cumprod.dimname_out(Tensor self, Dimname dim, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+
+- func: cumprod_backward(Tensor grad, Tensor input, int dim, Tensor output) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+
+- func: cumsum(Tensor self, int dim, *, ScalarType? dtype=None) -> Tensor
+  structured_delegate: cumsum.out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: core
+
+- func: cumsum_(Tensor(a!) self, int dim, *, ScalarType? dtype=None) -> Tensor(a!)
+  structured_delegate: cumsum.out
+  variants: method
+
+- func: cumsum.out(Tensor self, int dim, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: cumsum_out
+    MPS: cumsum_out_mps
+
+- func: cumsum.dimname(Tensor self, Dimname dim, *, ScalarType? dtype=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+
+- func: cumsum_.dimname(Tensor(a!) self, Dimname dim, *, ScalarType? dtype=None) -> Tensor(a!)
+  variants: method
+
+- func: cumsum.dimname_out(Tensor self, Dimname dim, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+
+- func: cumulative_trapezoid.x(Tensor y, Tensor x, *, int dim=-1) -> Tensor
+
+- func: cumulative_trapezoid.dx(Tensor y, *, Scalar dx=1, int dim=-1) -> Tensor
+
+- func: ctc_loss.IntList(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank=0, int reduction=Mean, bool zero_infinity=False) -> Tensor
+
+# convenience function that converts to intlists for you
+- func: ctc_loss.Tensor(Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, int blank=0, int reduction=Mean, bool zero_infinity=False) -> Tensor
+
+- func: _ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank=0, bool zero_infinity=False) -> (Tensor, Tensor)
+  dispatch:
+    CPU: ctc_loss_cpu
+    CUDA: ctc_loss_gpu
+    Meta: ctc_loss_meta
+  autogen: _ctc_loss.out
+  tags: dynamic_output_shape  # the shape of second output is data dependent
+
+- func: _ctc_loss.Tensor(Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, int blank=0, bool zero_infinity=False) -> (Tensor, Tensor)
+  dispatch:
+    CPU, CUDA: ctc_loss_tensor
+  autogen: _ctc_loss.Tensor_out
+  tags: dynamic_output_shape  # the shape of second output is data dependent
+
+- func: _ctc_loss_backward(Tensor grad, Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, Tensor neg_log_likelihood, Tensor log_alpha, int blank, bool zero_infinity=False) -> Tensor
+  dispatch:
+    CPU: ctc_loss_backward_cpu
+    CUDA: ctc_loss_backward_gpu
+  autogen: _ctc_loss_backward.out
+
+- func: _ctc_loss_backward.Tensor(Tensor grad, Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, Tensor neg_log_likelihood, Tensor log_alpha, int blank, bool zero_infinity=False) -> Tensor
+  dispatch:
+    CPU, CUDA: ctc_loss_backward_tensor
+
+- func: diag_embed(Tensor self, int offset=0, int dim1=-2, int dim2=-1) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: diag_embed
+  autogen: diag_embed.out
+
+- func: diagflat(Tensor self, int offset=0) -> Tensor
+  variants: function, method
+
+- func: diagonal(Tensor(a) self, int offset=0, int dim1=0, int dim2=1) -> Tensor(a)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: diagonal
+  tags: core
+
+- func: linalg_diagonal(Tensor(a) A, *, int offset=0, int dim1=-2, int dim2=-1) -> Tensor(a)
+  python_module: linalg
+  variants: function
+
+- func: diagonal.Dimname(Tensor(a) self, *, Dimname outdim, Dimname dim1, Dimname dim2, int offset=0) -> Tensor(a)
+  variants: function, method
+
+- func: diagonal_backward(Tensor grad_output, SymInt[] input_sizes, int offset, int dim1, int dim2) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: diagonal_backward_symint
+  autogen: diagonal_backward.out
+
+- func: fill_diagonal_(Tensor(a!) self, Scalar fill_value, bool wrap=False) -> Tensor(a!)
+  variants: method
+
+- func: diff(Tensor self, int n=1, int dim=-1, Tensor? prepend=None, Tensor? append=None) -> Tensor
+  variants: function, method
+
+- func: diff.out(Tensor self, int n=1, int dim=-1, Tensor? prepend=None, Tensor? append=None, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+
+- func: gradient.scalarint(Tensor self, *, Scalar? spacing=None, int? dim=None, int edge_order=1) -> Tensor[]
+  variants: function
+
+- func: gradient.scalararray(Tensor self, *, Scalar spacing, int[] dim, int edge_order=1) -> Tensor[]
+  variants: function
+
+- func: gradient.array(Tensor self, *, int[] dim, int edge_order=1) -> Tensor[]
+  variants: function
+
+- func: gradient.scalarrayint(Tensor self, *, Scalar[] spacing, int? dim=None, int edge_order=1) -> Tensor[]
+  variants: function
+
+- func: gradient.scalarrayarray(Tensor self, *, Scalar[] spacing, int[] dim, int edge_order=1) -> Tensor[]
+  variants: function
+
+- func: gradient.tensorarrayint(Tensor self, *, Tensor[] spacing, int? dim=None, int edge_order=1) -> Tensor[]
+  variants: function
+
+- func: gradient.tensorarray(Tensor self, *, Tensor[] spacing, int[] dim, int edge_order=1) -> Tensor[]
+  variants: function
+
+- func: div.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: div.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: div_sparse
+    ZeroTensor: div_zerotensor
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_div_Tensor
+  tags: [core, pointwise]
+
+- func: div_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: div.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: div_sparse_
+  tags: pointwise
+
+- func: div.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: div_out
+    SparseCPU, SparseCUDA, SparseMPS: div_out_sparse_zerodim
+  tags: pointwise
+
+- func: div.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: div.out_mode
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: div_sparse
+  tags: [core, pointwise]
+
+- func: div_.Tensor_mode(Tensor(a!) self, Tensor other, *, str? rounding_mode) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: div.out_mode
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: div_sparse_
+  tags: pointwise
+
+- func: div.out_mode(Tensor self, Tensor other, *, str? rounding_mode, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: div_out_mode
+    SparseCPU, SparseCUDA, SparseMPS: div_out_sparse_zerodim
+  tags: pointwise
+
+# For C++ only, until we have conversion from C++ numbers to Tensor
+- func: div.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: div
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_div_Scalar
+  tags: [core, pointwise]
+
+- func: div_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: div_
+  autogen: div.Scalar_out
+  tags: pointwise
+
+- func: div.Scalar_mode(Tensor self, Scalar other, *, str? rounding_mode) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: div
+  tags: [core, pointwise]
+
+- func: div_.Scalar_mode(Tensor(a!) self, Scalar other, *, str? rounding_mode) -> Tensor(a!)
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: div_
+  autogen: div.Scalar_mode_out
+  tags: pointwise
+
+# divide, alias for div
+- func: divide.Tensor(Tensor self, Tensor other) -> Tensor
+  variants: function, method
+
+- func: divide_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  variants: method
+
+- func: divide.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: divide.Scalar(Tensor self, Scalar other) -> Tensor
+  variants: function, method
+
+- func: divide_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  variants: method
+
+- func: divide.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> Tensor
+  variants: function, method
+
+- func: divide_.Tensor_mode(Tensor(a!) self, Tensor other, *, str? rounding_mode) -> Tensor(a!)
+  variants: method
+
+- func: divide.out_mode(Tensor self, Tensor other, *, str? rounding_mode, Tensor(a!) out) -> Tensor(a!)
+
+- func: divide.Scalar_mode(Tensor self, Scalar other, *, str? rounding_mode) -> Tensor
+  variants: function, method
+
+- func: divide_.Scalar_mode(Tensor(a!) self, Scalar other, *, str? rounding_mode) -> Tensor(a!)
+  variants: method
+
+  # true_divide, an alias for div
+- func: true_divide.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: pointwise
+
+- func: true_divide_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: true_divide.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+
+- func: true_divide.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+
+- func: true_divide_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: dot(Tensor self, Tensor tensor) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU: dot
+    CUDA: dot_cuda
+    MPS: dot_mps
+
+- func: dot.out(Tensor self, Tensor tensor, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: dot_out
+
+- func: vdot(Tensor self, Tensor other) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU: vdot
+    CUDA: vdot_cuda
+
+- func: vdot.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: vdot_out
+
+- func: einsum(str equation, Tensor[] tensors, *, int[]? path=None) -> Tensor
+
+- func: embedding(Tensor weight, Tensor indices, SymInt padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: embedding_symint
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_embedding
+  autogen: embedding.out
+  tags: core
+
+- func: embedding_backward(Tensor grad, Tensor indices, SymInt num_weights, SymInt padding_idx, bool scale_grad_by_freq, bool sparse) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: embedding_backward_symint
+
+- func: embedding_dense_backward(Tensor grad_output, Tensor indices, SymInt num_weights, SymInt padding_idx, bool scale_grad_by_freq) -> Tensor
+  dispatch:
+    CPU: embedding_dense_backward_cpu
+    CUDA: embedding_dense_backward_cuda
+    MPS: embedding_dense_backward_mps
+  autogen: embedding_dense_backward.out
+  tags: core
+
+- func: embedding_renorm_(Tensor(a!) self, Tensor indices, float max_norm, float norm_type) -> Tensor(a!)
+  dispatch:
+    CPU: embedding_renorm_cpu_
+    CUDA: embedding_renorm_cuda_
+  autogen: embedding_renorm, embedding_renorm.out
+
+- func: embedding_sparse_backward(Tensor grad, Tensor indices, int num_weights, int padding_idx, bool scale_grad_by_freq) -> Tensor
+
+# NOTE [ embedding_bag Native Functions ]
+# The `_embedding_bag.*` variants assume that input tensors except for `weight`,
+# e.g. `indices` and `offsets` (and `offset2bag`), are contiguous.
+# We really only need to enforce this for `_embedding_bag` (the forward) because
+# the backward inputs are the same as forward ones.
+# The above `embedding_bag` wrapper is created to achieve this, e.g.,
+# applying indices = indices.contiguous().
+# The backward functions apply a check that these input tensors are contiguous.
+
+
+- func: _embedding_bag_forward_only(Tensor weight, Tensor indices, Tensor offsets, bool scale_grad_by_freq=False, int mode=0, bool sparse=False, Tensor? per_sample_weights=None, bool include_last_offset=False, int padding_idx=-1) -> (Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: _embedding_bag_forward_only_cpu
+    CUDA: _embedding_bag_forward_only_cuda
+    MPS: _embedding_bag_forward_only_mps
+  autogen: _embedding_bag_forward_only.out
+
+- func: _rowwise_prune(Tensor weight, Tensor mask, ScalarType compressed_indices_dtype) -> (Tensor, Tensor)
+
+# row_stack is the alias of vstack
+- func: row_stack(Tensor[] tensors) -> Tensor
+
+- func: row_stack.out(Tensor[] tensors, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: embedding_bag(Tensor weight, Tensor indices, Tensor offsets, bool scale_grad_by_freq=False, int mode=0, bool sparse=False, Tensor? per_sample_weights=None, bool include_last_offset=False) -> (Tensor, Tensor, Tensor, Tensor)
+
+# To keep backward and forward compatibility, and to avoid ambiguity with the
+# original signature above, scale_grad_by_freq, mode, sparse,
+# per_sample_weights, and include_last_offset parameters do not have default
+# values. Once the original signature is removed, default values can be added.
+- func: embedding_bag.padding_idx(Tensor weight, Tensor indices, Tensor offsets, bool scale_grad_by_freq, int mode, bool sparse, Tensor? per_sample_weights, bool include_last_offset, int? padding_idx) -> (Tensor, Tensor, Tensor, Tensor)
+
+- func: _embedding_bag(Tensor weight, Tensor indices, Tensor offsets, bool scale_grad_by_freq=False, int mode=0, bool sparse=False, Tensor? per_sample_weights=None, bool include_last_offset=False, int padding_idx=-1) -> (Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: _embedding_bag_cpu
+    CUDA: _embedding_bag_cuda
+    MPS: _embedding_bag_mps
+  autogen: _embedding_bag.out
+  tags: core
+
+- func: _embedding_bag_backward(Tensor grad, Tensor indices, Tensor offsets, Tensor offset2bag, Tensor bag_size, Tensor maximum_indices, SymInt num_weights, bool scale_grad_by_freq, int mode, bool sparse, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor
+  dispatch:
+    CPU, CUDA, MPS: _embedding_bag_backward_symint
+
+- func: _embedding_bag_sparse_backward(Tensor grad, Tensor indices, Tensor offsets, Tensor offset2bag, Tensor bag_size, SymInt num_weights, bool scale_grad_by_freq, int mode, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: _embedding_bag_sparse_backward_symint
+
+- func: _embedding_bag_dense_backward(Tensor grad, Tensor indices, Tensor offset2bag, Tensor bag_size, Tensor maximum_indices, SymInt num_weights, bool scale_grad_by_freq, int mode, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor
+  dispatch:
+    CPU: _embedding_bag_dense_backward_cpu
+    CUDA: _embedding_bag_dense_backward_cuda
+    MPS: _embedding_bag_dense_backward_mps
+  autogen: _embedding_bag_dense_backward.out
+
+- func: _embedding_bag_per_sample_weights_backward(Tensor grad, Tensor weight, Tensor indices, Tensor offsets, Tensor offset2bag, int mode, int padding_idx=-1) -> Tensor
+  dispatch:
+    CPU: _embedding_bag_per_sample_weights_backward_cpu
+    CUDA: _embedding_bag_per_sample_weights_backward_cuda
+    MPS: _embedding_bag_per_sample_weights_backward_mps
+  autogen: _embedding_bag_per_sample_weights_backward.out
+
+- func: empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: empty_names
+  autogen: empty.names_out
+
+- func: empty.memory_format(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  dispatch:
+    CPU: empty_cpu
+    CUDA: empty_cuda
+    MPS: empty_mps
+    Meta: empty_meta_symint
+    MkldnnCPU: empty_mkldnn
+    SparseCPU, SparseCUDA, SparseMPS: empty_sparse
+    SparseMeta: empty_sparse_symint
+    SparseCsrCPU, SparseCsrCUDA: empty_sparse_compressed
+    SparseCsrMeta: empty_sparse_compressed_symint
+    QuantizedCPU, QuantizedCUDA, QuantizedMeta: empty_unknown_quantized
+  tags: core
+
+- func: empty_permuted(SymInt[] size, int[] physical_layout, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: empty_permuted_symint
+  autogen: empty_permuted.out
+
+# We do not make new_empty a composite that calls into new_empty_strided, as the strided version
+# is significantly more difficult to implement by different backends
+- func: new_empty(Tensor self, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: new_empty_symint
+  autogen: new_empty.out
+
+- func: new_empty_strided(Tensor self, SymInt[] size, SymInt[] stride, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  variants: method
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: new_empty_strided_symint
+  autogen: new_empty_strided.out
+
+- func: new_full(Tensor self, SymInt[] size, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  variants: method
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: new_full
+  autogen: new_full.out
+
+- func: new_zeros(Tensor self, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  variants: method
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: new_zeros
+  autogen: new_zeros.out
+
+- func: new_ones(Tensor self, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  variants: method
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: new_ones
+  autogen: new_ones.out
+
+# other overrides are to provide a more helpful error message that dtype is required
+- func: _empty_affine_quantized(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, float scale=1, int zero_point=0, MemoryFormat? memory_format=contiguous_format) -> Tensor
+  dispatch:
+    CPU: empty_affine_quantized_other_backends_stub
+    QuantizedCPU, QuantizedCUDA: empty_affine_quantized
+  autogen: _empty_affine_quantized.out
+
+# it's a factory function receiving a tensor argument, thus overriding explicitly
+# other overrides are to provide a more helpful error message that dtype is required
+- func: _empty_per_channel_affine_quantized(SymInt[] size, *, Tensor scales, Tensor zero_points, int axis, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=contiguous_format) -> Tensor
+  category_override: factory
+  dispatch:
+    CPU: empty_per_channel_affine_quantized_other_backends_stub
+    QuantizedCPU, QuantizedCUDA: empty_per_channel_affine_quantized
+  autogen: _empty_per_channel_affine_quantized.out
+
+- func: resize_(Tensor(a!) self, SymInt[] size, *, MemoryFormat? memory_format=None) -> Tensor(a!)
+  use_const_ref_for_mutable_tensors: True
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  tags: [core, inplace_view]
+  dispatch:
+    Meta: resize__symint
+    CPU: resize_
+    CUDA: resize_cuda_
+    MPS: resize_mps_
+    QuantizedCPU: quantized_resize_cpu_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: resize_sparse_csr_
+  autogen: resize, resize.out
+
+# This is a utility function to enable users to resize out tensor while registering kernels for out variants.
+# Eventually, we can consider exposing `resize_output` as a public API to ship it with python op registration
+# to make it easy to register out variants for ops.
+- func: _resize_output_(Tensor(a!) self, SymInt[] size, Device device) -> Tensor(a!)
+  use_const_ref_for_mutable_tensors: True
+  variants: function
+  dispatch:
+    Meta: _resize_output_
+  autogen: _resize_output, _resize_output.out
+
+- func: empty_quantized(int[] size, Tensor qtensor, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  category_override: factory
+  variants: function
+  dispatch:
+    QuantizedCPU, QuantizedCUDA: empty_quantized
+  autogen: empty_quantized.out
+
+- func: empty.out(SymInt[] size, *, MemoryFormat? memory_format=None, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  device_guard: False
+
+- func: empty_like(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: empty_like
+    QuantizedCPU, QuantizedCUDA: empty_like_quantized
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: empty_like_sparse_coo
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: empty_like_sparse_csr
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: empty_like_nested
+  autogen: empty_like.out
+
+- func: empty_strided(SymInt[] size, SymInt[] stride, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CPU: empty_strided_cpu
+    CUDA: empty_strided_cuda
+    MPS: empty_strided_mps
+    Meta: empty_strided_meta_symint
+    QuantizedCPU, QuantizedCUDA: empty_strided_unknown_quantized
+  autogen: empty_strided.out
+  tags: core
+
+- func: erf(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: erf.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: erf_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: erf_sparse_csr
+  tags: [core, pointwise]
+
+- func: erf_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: erf.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: erf_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: erf_sparse_csr_
+  tags: pointwise
+
+- func: erf.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: erf_out
+    SparseCPU, SparseCUDA, SparseMPS: erf_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: erf_sparse_csr_out
+  tags: pointwise
+
+- func: erfc(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: erfc.out
+  variants: function, method
+  tags: pointwise
+
+- func: erfc_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: erfc.out
+  variants: function, method
+  tags: pointwise
+
+- func: erfc.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: erfc_out
+  tags: pointwise
+
+- func: exp(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: exp.out
+  variants: function, method
+  tags: [core, pointwise]
+
+- func: exp_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: exp.out
+  variants: function, method
+  tags: pointwise
+
+- func: exp.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: exp_out
+  tags: pointwise
+
+- func: exp2(Tensor self) -> Tensor
+  structured_delegate: exp2.out
+  variants: function, method
+  tags: pointwise
+
+- func: exp2_(Tensor(a!) self) -> Tensor(a!)
+  structured_delegate: exp2.out
+  variants: function, method
+  tags: pointwise
+
+- func: exp2.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: exp2_out
+  tags: pointwise
+
+- func: expm1(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: expm1.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: expm1_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: expm1_sparse_csr
+  tags: [core, pointwise]
+
+- func: expm1_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: expm1.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: expm1_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: expm1_sparse_csr_
+  tags: pointwise
+
+- func: expm1.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: expm1_out
+    SparseCPU, SparseCUDA, SparseMPS: expm1_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: expm1_sparse_csr_out
+  tags: pointwise
+
+- func: expand(Tensor(a) self, SymInt[] size, *, bool implicit=False) -> Tensor(a)
+  variants: method  # This is method-only to match the previous tensor API. In the future we could make this a function too.
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: expand
+  tags: core
+
+- func: expand_as(Tensor(a) self, Tensor other) -> Tensor(a)
+  variants: method  # This is method-only to match the previous tensor API. In the future we could make this a function too.
+  device_check: NoCheck
+  device_guard: False
+
+# decomposes to eye.m
+- func: eye(SymInt n, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: eye
+
+- func: eye.m(SymInt n, SymInt m, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: eye
+
+- func: eye.out(SymInt n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, Meta: eye_out_cpu
+    CUDA: eye_out_cuda
+    MPS: eye_out_mps
+
+- func: eye.m_out(SymInt n, SymInt m, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, Meta: eye_out_cpu
+    CUDA: eye_out_cuda
+    MPS: eye_out_mps
+
+- func: flatten.using_ints(Tensor(a) self, int start_dim=0, int end_dim=-1) -> Tensor(a)
+  variants: function, method
+
+- func: flatten.named_out_dim(Tensor(a) self, int start_dim, int end_dim, Dimname out_dim) -> Tensor(a)
+  variants: function, method
+
+- func: flatten.using_names(Tensor(a) self, Dimname start_dim, Dimname end_dim, Dimname out_dim) -> Tensor(a)
+  variants: function, method
+
+- func: flatten.DimnameList(Tensor(a) self, Dimname[] dims, Dimname out_dim) -> Tensor(a)
+  variants: function, method
+
+- func: unflatten.int(Tensor(a) self, int dim, SymInt[] sizes) -> Tensor(a)
+  variants: function, method
+  dispatch:
+    CompositeImplicitAutograd: unflatten_symint
+
+- func: unflatten.Dimname(Tensor(a) self, Dimname dim, SymInt[] sizes, Dimname[] names) -> Tensor(a)
+  variants: function, method
+  dispatch:
+    CompositeImplicitAutograd: unflatten_dimname_symint
+
+- func: fill.Scalar(Tensor self, Scalar value) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: fill
+  tags: core
+
+- func: fill.Tensor(Tensor self, Tensor value) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: fill
+
+- func: fill_.Scalar(Tensor(a!) self, Scalar value) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CPU, CUDA: fill_
+    MPS: fill_scalar_mps
+    QuantizedCPU, QuantizedCUDA: fill_quantized_
+    Meta: fill_meta_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: fill_sparse_csr_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: fill_nested_
+  autogen: fill.Scalar_out
+
+- func: fill_.Tensor(Tensor(a!) self, Tensor value) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CPU, CUDA: fill_
+    MPS: fill_tensor_mps_
+    QuantizedCPU, QuantizedCUDA: fill_quantized_
+    Meta: fill_meta_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: fill_nested_
+  autogen: fill.Tensor_out
+
+- func: floor(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: floor.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: floor_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: floor_sparse_csr
+  tags: [core, pointwise]
+
+- func: floor_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: floor.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: floor_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: floor_sparse_csr_
+  tags: pointwise
+
+- func: floor.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: floor_out
+    SparseCPU, SparseCUDA, SparseMPS: floor_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: floor_sparse_csr_out
+  tags: pointwise
+
+- func: floor_divide(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CPU, CUDA, MPS, MTIA: floor_divide
+    SparseCPU, SparseCUDA, SparseMPS: floor_divide_sparse
+
+- func: floor_divide_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CPU, CUDA, MPS: floor_divide_
+    SparseCPU, SparseCUDA, SparseMPS: floor_divide_sparse_
+
+- func: floor_divide.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MPS, MTIA: floor_divide_out
+    SparseCPU, SparseCUDA, SparseMPS: floor_divide_out_sparse_zerodim
+
+- func: floor_divide.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: floor_divide
+
+- func: floor_divide_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: floor_divide_
+  autogen: floor_divide.Scalar_out
+
+- func: frac(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: frac.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: frac_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: frac_sparse_csr
+  tags: pointwise
+
+- func: frac_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: frac.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: frac_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: frac_sparse_csr_
+  tags: pointwise
+
+- func: frac.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: frac_out
+    MPS: frac_out_mps
+    SparseCPU, SparseCUDA, SparseMPS: frac_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: frac_sparse_csr_out
+  tags: pointwise
+
+- func: full.names(int[] size, Scalar fill_value, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: full
+  autogen: full.names_out
+
+- func: full(SymInt[] size, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: full
+  tags: core
+
+- func: full.out(SymInt[] size, Scalar fill_value, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: full_out
+
+- func: full_like(Tensor self, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: full_like
+  autogen: full_like.out
+  tags: core
+
+- func: from_file(str filename, bool? shared=None, int? size=0, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CPU: from_file
+  autogen: from_file.out
+
+- func: gcd.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: gcd_out
+  tags: pointwise
+
+- func: gcd(Tensor self, Tensor other) -> Tensor
+  structured_delegate: gcd.out
+  variants: function, method
+  tags: pointwise
+
+- func: gcd_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: gcd.out
+  variants: function, method
+
+- func: lcm.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: lcm_out
+  tags: pointwise
+
+- func: lcm(Tensor self, Tensor other) -> Tensor
+  structured_delegate: lcm.out
+  variants: function, method
+  tags: pointwise
+
+- func: lcm_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: lcm.out
+  variants: function, method
+
+# NOTE [ grid_sampler Native Functions ]
+# `grid_sampler` is _supposed to_ do all the shape checking and then dispatch to
+# one of `cudnn_grid_sampler`, `grid_sampler_2d`, or `grid_sampler_3d`, each of
+# which has the corresponding backward defined as native functions as well.
+# However, we do shape checking everywhere for now since each of the mentioned
+# functions can be called directly, which will lead to crashes otherwise.
+# See https://github.com/pytorch/pytorch/issues/73187 for more information.
+#
+# There is also _grid_sampler_2d_backward_cpu_fallback which is an
+# implementation detail of grid_sampler_2d and is only exposed here for testing
+# purposes.
+#
+# Additionally, arguments `padding_mode` and `interpolation_mode` are cast to
+# enums defined in `native/GridSampler.h`. `cudnn_grid_sampler` doesn't take in
+# `interpolation_mode` because it only supports Bilinear interpolation mode.
+# Nor does it take in `align_corners` because it only supports the mode
+# `align_corners = True`.
+- func: grid_sampler(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> Tensor
+
+- func: grid_sampler_2d(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> Tensor
+  dispatch:
+    CPU, QuantizedCPU: grid_sampler_2d_cpu
+    CUDA: grid_sampler_2d_cuda
+    MPS: grid_sampler_2d_mps
+  autogen: grid_sampler_2d.out
+  tags: core
+
+# `grid_sampler_2d_backward` takes in `output_mask` to optimize performance for
+# the case where `input` doesn't require gradient. Gradient for `grid` is always
+# computed (only `output_mask[0]` is checked by the implementations).
+- func: grid_sampler_2d_backward(Tensor grad_output, Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners, bool[2] output_mask) -> (Tensor, Tensor)
+  dispatch:
+    CPU: grid_sampler_2d_backward_cpu
+    CUDA: grid_sampler_2d_backward_cuda
+  autogen: grid_sampler_2d_backward.out
+
+# See NOTE [ grid_sample CPU fallback ]
+- func: _grid_sampler_2d_cpu_fallback(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _grid_sampler_2d_cpu_fallback
+  autogen: _grid_sampler_2d_cpu_fallback.out
+
+- func: _grid_sampler_2d_cpu_fallback_backward(Tensor grad_output, Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> (Tensor, Tensor)
+
+- func: grid_sampler_3d(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> Tensor
+  dispatch:
+    CPU: grid_sampler_3d_cpu
+    CUDA: grid_sampler_3d_cuda
+    MPS: grid_sampler_3d_mps
+  autogen: grid_sampler_3d.out
+
+# `grid_sampler_3d_backward` takes in `output_mask` to optimize performance for
+# the case where `input` doesn't require gradient. Gradient for `grid` is always
+# computed (only `output_mask[0]` is checked by the implementations).
+- func: grid_sampler_3d_backward(Tensor grad_output, Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners, bool[2] output_mask) -> (Tensor, Tensor)
+  dispatch:
+    CPU: grid_sampler_3d_backward_cpu
+    CUDA: grid_sampler_3d_backward_cuda
+  autogen: grid_sampler_3d_backward.out
+
+- func: hann_window(int window_length, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: hann_window
+  autogen: hann_window.out
+
+- func: hann_window.periodic(int window_length, bool periodic, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: hann_window
+  autogen: hann_window.periodic_out
+
+- func: hamming_window(int window_length, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: hamming_window
+  autogen: hamming_window.out
+
+- func: hamming_window.periodic(int window_length, bool periodic, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: hamming_window
+  autogen: hamming_window.periodic_out
+
+- func: hamming_window.periodic_alpha(int window_length, bool periodic, float alpha, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: hamming_window
+  autogen: hamming_window.periodic_alpha_out
+
+- func: hamming_window.periodic_alpha_beta(int window_length, bool periodic, float alpha, float beta, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: hamming_window
+  autogen: hamming_window.periodic_alpha_beta_out
+
+- func: kaiser_window(int window_length, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: kaiser_window
+  autogen: kaiser_window.out
+
+- func: kaiser_window.periodic(int window_length, bool periodic, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: kaiser_window
+  autogen: kaiser_window.periodic_out
+
+- func: kaiser_window.beta(int window_length, bool periodic, float beta, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: kaiser_window
+  autogen: kaiser_window.beta_out
+
+- func: hinge_embedding_loss(Tensor self, Tensor target, float margin=1.0, int reduction=Mean) -> Tensor
+
+- func: group_norm(Tensor input, int num_groups, Tensor? weight=None, Tensor? bias=None, float eps=1e-05, bool cudnn_enabled=True) -> Tensor
+
+- func: native_group_norm(Tensor input, Tensor? weight, Tensor? bias, SymInt N, SymInt C, SymInt HxW, int group, float eps) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CPU, CUDA: native_group_norm
+    CompositeExplicitAutograd: math_group_norm
+  autogen: native_group_norm.out
+  tags: core
+
+- func: native_group_norm_backward(Tensor grad_out, Tensor input, Tensor mean, Tensor rstd, Tensor? weight, SymInt N, SymInt C, SymInt HxW, int group, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CPU, CUDA: native_group_norm_backward
+  autogen: native_group_norm_backward.out
+  tags: core
+
+# Real to complex forward FFT
+- func: _fft_r2c(Tensor self, int[] dim, int normalization, bool onesided) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _fft_r2c_mkl
+    CUDA: _fft_r2c_cufft
+    MPS: _fft_r2c_mps
+  tags: core
+
+- func: _fft_r2c.out(Tensor self, int[] dim, int normalization, bool onesided, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CPU: _fft_r2c_mkl_out
+    CUDA: _fft_r2c_cufft_out
+    MPS: _fft_r2c_mps_out
+
+# Complex to real inverse FFT
+- func: _fft_c2r(Tensor self, int[] dim, int normalization, SymInt last_dim_size) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _fft_c2r_mkl
+    CUDA: _fft_c2r_cufft
+    MPS: _fft_c2r_mps
+
+- func: _fft_c2r.out(Tensor self, int[] dim, int normalization, SymInt last_dim_size, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CPU: _fft_c2r_mkl_out
+    CUDA: _fft_c2r_cufft_out
+    MPS: _fft_c2r_mps_out
+
+# Standard complex to complex FFT (forward or backward)
+- func: _fft_c2c(Tensor self, SymInt[] dim, int normalization, bool forward) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _fft_c2c_mkl
+    CUDA: _fft_c2c_cufft
+    MPS: _fft_c2c_mps
+
+- func: _fft_c2c.out(Tensor self, SymInt[] dim, int normalization, bool forward, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CPU: _fft_c2c_mkl_out
+    CUDA: _fft_c2c_cufft_out
+    MPS: _fft_c2c_mps_out
+
+- func: _validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CPU: _validate_compressed_sparse_indices_cpu
+    CUDA: _validate_compressed_sparse_indices_cuda
+
+- func: _cufft_get_plan_cache_size(DeviceIndex device_index) -> int
+
+- func: _cufft_get_plan_cache_max_size(DeviceIndex device_index) -> int
+
+- func: _cufft_set_plan_cache_max_size(DeviceIndex device_index, int max_size) -> ()
+
+- func: _cufft_clear_plan_cache(DeviceIndex device_index) -> ()
+
+- func: index.Tensor(Tensor self, Tensor?[] indices) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: index.Tensor_out
+  variants: function, method
+  dispatch:
+    QuantizedCPU: quantized_index
+  tags: [core, dynamic_output_shape]
+  # NB: This function is special-cased in tools/autograd/gen_variable_type.py
+  # NB: The following functions are declared in aten/src/ATen/templates/TensorBody.h and defined in aten/src/ATen/TensorIndexing.cpp:
+  # - Tensor Tensor::index(ArrayRef<TensorIndex> indices)
+  # - Tensor Tensor::index(std::initializer_list<TensorIndex> indices)
+
+- func: index.Tensor_out(Tensor self, Tensor?[] indices, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  structured: True
+  structured_inherits: TensorIteratorBase
+  precomputed:
+  - indices -> DimVector sizes, DimVector strides
+  dispatch:
+    CPU, CUDA, MPS: index_out
+
+# Used by inductor to signal indexing without bounds checks
+# Note that we don't support boolean indexing, to avoid dynamic output shapes
+- func: _unsafe_index.Tensor(Tensor self, Tensor?[] indices) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _unsafe_index
+
+# Used by inductor to generate masked loads
+# Note that we don't support boolean indexing, to avoid dynamic output shapes
+- func: _unsafe_masked_index(Tensor self, Tensor mask, Tensor?[] indices, Scalar fill) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _unsafe_masked_index
+
+- func: _unsafe_masked_index_put_accumulate(Tensor self, Tensor mask, Tensor?[] indices, Tensor values) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _unsafe_masked_index_put_accumulate
+
+- func: index_copy.out(Tensor self, int dim, Tensor index, Tensor source, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  precomputed:
+  - dim -> int dim
+  dispatch:
+    CPU, CUDA: index_copy_out
+    MPS: index_copy_out_mps
+
+- func: index_copy_(Tensor(a!) self, int dim, Tensor index, Tensor source) -> Tensor(a!)
+  variants: method
+  structured_delegate: index_copy.out
+
+- func: index_copy(Tensor self, int dim, Tensor index, Tensor source) -> Tensor
+  variants: function, method
+  structured_delegate: index_copy.out
+
+- func: index_copy_.dimname(Tensor(a!) self, Dimname dim, Tensor index, Tensor source) -> Tensor(a!)
+  variants: method
+
+- func: index_copy.dimname(Tensor self, Dimname dim, Tensor index, Tensor source) -> Tensor
+  variants: function, method
+
+- func: index_put_(Tensor(a!) self, Tensor?[] indices, Tensor values, bool accumulate=False) -> Tensor(a!)
+  device_check: NoCheck   # delegate to _index_put_impl_, which leverages TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: index_put_
+  autogen: index_put.out
+  # NB: The following functions are declared in aten/src/ATen/templates/TensorBody.h and defined in aten/src/ATen/TensorIndexing.cpp:
+  # - Tensor & Tensor::index_put_(ArrayRef<TensorIndex> indices, Tensor const & rhs)
+  # - Tensor & Tensor::index_put_(ArrayRef<TensorIndex> indices, Scalar v)
+  # - Tensor & Tensor::index_put_(std::initializer_list<TensorIndex> indices, Tensor const & rhs)
+  # - Tensor & Tensor::index_put_(std::initializer_list<TensorIndex> indices, Scalar v)
+
+- func: index_put(Tensor self, Tensor?[] indices, Tensor values, bool accumulate=False) -> Tensor
+  device_check: NoCheck   # delegate to _index_put_impl_ after clone, which leverages TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: index_put
+  tags: core
+
+- func: _unsafe_index_put(Tensor self, Tensor?[] indices, Tensor values, bool accumulate=False) -> Tensor
+  device_check: NoCheck   # delegate to _index_put_impl_ after clone, which leverages TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _unsafe_index_put
+
+- func: _index_put_impl_(Tensor(a!) self, Tensor?[] indices, Tensor values, bool accumulate=False, bool unsafe=False) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS: _index_put_impl_
+    QuantizedCPU: _index_put_impl_quantized_cpu_
+    QuantizedCUDA: _index_put_impl_quantized_cuda_
+  autogen: _index_put_impl, _index_put_impl.out
+
+- func: instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> Tensor
+  variants: function
+
+- func: isclose(Tensor self, Tensor other, float rtol=1e-05, float atol=1e-08, bool equal_nan=False) -> Tensor
+  variants: function, method
+
+- func: isin.Tensor_Tensor_out(Tensor elements, Tensor test_elements, *, bool assume_unique=False, bool invert=False, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  structured: True
+  dispatch:
+    CPU, CUDA: isin_Tensor_Tensor_out
+    MPS: isin_Tensor_Tensor_out_mps
+
+- func: isin.Tensor_Tensor(Tensor elements, Tensor test_elements, *, bool assume_unique=False, bool invert=False) -> Tensor
+  variants: function
+  structured_delegate: isin.Tensor_Tensor_out
+
+- func: isin.Tensor_Scalar_out(Tensor elements, Scalar test_element, *, bool assume_unique=False, bool invert=False, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  structured: True
+  dispatch:
+    CPU, CUDA, MPS: isin_Tensor_Scalar_out
+
+- func: isin.Tensor_Scalar(Tensor elements, Scalar test_element, *, bool assume_unique=False, bool invert=False) -> Tensor
+  variants: function
+  structured_delegate: isin.Tensor_Scalar_out
+
+- func: isin.Scalar_Tensor_out(Scalar element, Tensor test_elements, *, bool assume_unique=False, bool invert=False, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  structured: True
+  dispatch:
+    CPU, CUDA: isin_Scalar_Tensor_out
+    MPS: isin_Scalar_Tensor_out_mps
+
+- func: isin.Scalar_Tensor(Scalar element, Tensor test_elements, *, bool assume_unique=False, bool invert=False) -> Tensor
+  variants: function
+  structured_delegate: isin.Scalar_Tensor_out
+
+- func: isnan(Tensor self) -> Tensor
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CPU, CUDA, MPS, MTIA: isnan
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_isnan
+    SparseCPU, SparseCUDA, SparseMPS: isnan_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: isnan_sparse_csr
+  autogen: isnan.out
+  tags: [core, pointwise]
+
+- func: is_distributed(Tensor self) -> bool
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+
+- func: is_floating_point(Tensor self) -> bool
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  manual_cpp_binding: True
+
+- func: is_complex(Tensor self) -> bool
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  manual_cpp_binding: True
+
+- func: is_conj(Tensor self) -> bool
+  variants: function, method
+  device_guard: False
+  manual_cpp_binding: True
+
+- func: _is_zerotensor(Tensor self) -> bool
+  variants: function, method
+  device_guard: False
+  manual_cpp_binding: True
+
+- func: is_neg(Tensor self) -> bool
+  variants: function, method
+  device_guard: False
+  manual_cpp_binding: True
+
+- func: isreal(Tensor self) -> Tensor
+  variants: function, method
+
+- func: is_nonzero(Tensor self) -> bool
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+
+- func: is_same_size(Tensor self, Tensor other) -> bool
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: nested_is_same_size
+    CompositeExplicitAutograd: is_same_size
+
+- func: is_signed(Tensor self) -> bool
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  manual_cpp_binding: True
+
+- func: is_inference(Tensor self) -> bool
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  manual_cpp_binding: True
+
+- func: kl_div(Tensor self, Tensor target, int reduction=Mean, *, bool log_target=False) -> Tensor
+
+- func: kron(Tensor self, Tensor other) -> Tensor
+  variants: function, method
+
+- func: kron.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: kthvalue(Tensor self, SymInt k, int dim=-1, bool keepdim=False) -> (Tensor values, Tensor indices)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: kthvalue
+
+- func: kthvalue.values(Tensor self, SymInt k, int dim=-1, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  dispatch:
+    CPU: kthvalue_out_cpu
+    CUDA: kthvalue_out_cuda
+    MPS: kthvalue_out_mps
+
+- func: kthvalue.dimname(Tensor self, SymInt k, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  variants: function, method
+
+- func: kthvalue.dimname_out(Tensor self, SymInt k, Dimname dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+
+- func: layer_norm(Tensor input, SymInt[] normalized_shape, Tensor? weight=None, Tensor? bias=None, float eps=1e-05, bool cudnn_enable=True) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: layer_norm_symint
+
+- func: native_layer_norm(Tensor input, SymInt[] normalized_shape, Tensor? weight, Tensor? bias, float eps) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: layer_norm_cpu
+    CUDA: layer_norm_cuda
+    MPS: layer_norm_mps
+    CompositeExplicitAutograd: math_native_layer_norm
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: nested_layer_norm
+  autogen: native_layer_norm.out
+  tags: core
+
+- func: native_layer_norm_backward(Tensor grad_out, Tensor input, SymInt[] normalized_shape, Tensor mean, Tensor rstd, Tensor? weight, Tensor? bias, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: layer_norm_backward_cpu
+    CUDA: layer_norm_backward_cuda
+    MPS: layer_norm_backward_mps
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: layer_norm_backward_nested
+  autogen: native_layer_norm_backward.out
+  tags: core
+
+- func: rms_norm(Tensor input, SymInt[] normalized_shape, Tensor? weight=None, float? eps=None) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: rms_norm_symint
+
+- func: _fused_rms_norm(Tensor input, int[] normalized_shape, Tensor? weight, float? eps) -> (Tensor, Tensor)
+  dispatch:
+    CUDA: _fused_rms_norm_cuda
+    MPS: _fused_rms_norm_mps
+    CompositeImplicitAutograd: rms_norm_composite
+
+- func: _fused_rms_norm_backward(Tensor grad_out, Tensor input, int[] normalized_shape, Tensor rstd, Tensor? weight, bool[2] output_mask) -> (Tensor, Tensor)
+  dispatch:
+    CUDA: _fused_rms_norm_backward_cuda
+
+- func: nan_to_num(Tensor self, float? nan=None, float? posinf=None, float? neginf=None) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: nan_to_num
+    SparseCPU, SparseCUDA, SparseMPS: nan_to_num_sparse
+  tags: pointwise
+
+- func: nan_to_num_(Tensor(a!) self, float? nan=None, float? posinf=None, float? neginf=None) -> Tensor(a!)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: nan_to_num_
+    SparseCPU, SparseCUDA, SparseMPS: nan_to_num_sparse_
+  tags: pointwise
+
+- func: nan_to_num.out(Tensor self, float? nan=None, float? posinf=None, float? neginf=None, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MTIA: nan_to_num_out
+    MPS: nan_to_num_out_mps
+    SparseCPU, SparseCUDA, SparseMPS: nan_to_num_sparse_out
+  tags: pointwise
+
+- func: linear(Tensor input, Tensor weight, Tensor? bias=None) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: linear
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: nested_linear
+    MPS: _mps_linear
+
+- func: linear_backward(Tensor self, Tensor grad_output, Tensor weight, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: nested_linear_backward
+    MPS: mps_linear_backward
+  autogen: linear_backward.out
+
+- func: linear.out(Tensor input, Tensor weight, Tensor? bias=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CompositeExplicitAutograd: linear_out
+
+- func: mkldnn_linear(Tensor self, Tensor weight, Tensor? bias=None) -> Tensor
+  python_module: nn
+  dispatch:
+    MkldnnCPU: mkldnn_linear
+  autogen: mkldnn_linear.out
+
+- func: mkldnn_linear_backward_input(int[] input_size, Tensor grad_output, Tensor weight) -> Tensor
+  dispatch:
+    MkldnnCPU: mkldnn_linear_backward_input
+  autogen: mkldnn_linear_backward_input.out
+
+- func: mkldnn_linear_backward_weights(Tensor grad_output, Tensor input, Tensor weight, bool bias_defined) -> (Tensor, Tensor)
+  dispatch:
+    MkldnnCPU: mkldnn_linear_backward_weights
+  autogen: mkldnn_linear_backward_weights.out
+
+- func: mkldnn_linear_backward(Tensor self, Tensor grad_output, Tensor weight, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    MkldnnCPU: mkldnn_linear_backward
+  autogen: mkldnn_linear_backward.out
+
+- func: _cslt_compress(Tensor input) -> Tensor
+  dispatch:
+    CUDA: _cslt_compress
+
+- func: _cslt_sparse_mm(Tensor compressed_A, Tensor dense_B, Tensor? bias=None, Tensor? alpha=None, ScalarType? out_dtype=None, bool transpose_result=False, int alg_id=0, int split_k=1, int split_k_mode=-1) -> Tensor
+  dispatch:
+    CUDA: _cslt_sparse_mm
+  tags: needs_fixed_stride_order
+
+- func: _cslt_sparse_mm_search(Tensor compressed_A, Tensor dense_B, Tensor? bias=None, Tensor? alpha=None, ScalarType? out_dtype=None, bool transpose_result=False) -> int
+  dispatch:
+    CUDA: _cslt_sparse_mm_search
+
+- func: _sparse_semi_structured_tile(Tensor input, str algorithm="", bool use_cutlass=True) -> (Tensor, Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: _sparse_semi_structured_tile
+
+- func: _sparse_semi_structured_apply(Tensor input, Tensor thread_masks) -> (Tensor, Tensor)
+  dispatch:
+    CUDA: _sparse_semi_structured_apply
+
+- func: _sparse_semi_structured_apply_dense(Tensor input, Tensor thread_masks) -> Tensor
+  dispatch:
+    CUDA: _sparse_semi_structured_apply_dense
+
+# DEPRECATED: Use torch.__sparse_semi_structured_mm/torch._sparse_semi_structured_addmm instead
+- func: _sparse_semi_structured_linear(Tensor input, Tensor weight, Tensor meta, *, Tensor? bias=None, str? activation=None, ScalarType? out_dtype=None) -> Tensor
+  dispatch:
+    CUDA: _sparse_semi_structured_linear
+
+- func: _sparse_semi_structured_mm(Tensor mat1, Tensor mat1_meta, Tensor mat2, *, ScalarType? out_dtype=None) -> Tensor
+  dispatch:
+    CUDA: _sparse_semi_structured_mm
+
+- func: _sparse_semi_structured_addmm(Tensor input, Tensor mat1, Tensor mat1_meta, Tensor mat2, *, Scalar alpha=1, Scalar beta=1, ScalarType? out_dtype=None) -> Tensor
+  dispatch:
+    CUDA: _sparse_semi_structured_addmm
+
+- func: _mixed_dtypes_linear(Tensor input, Tensor weight, Tensor scale, *, Tensor? bias=None, str? activation=None) -> Tensor
+  dispatch:
+    CUDA: _mixed_dtypes_linear
+
+- func: fbgemm_linear_int8_weight_fp32_activation(Tensor input, Tensor weight, Tensor packed, Tensor col_offsets, Scalar weight_scale, Scalar weight_zero_point, Tensor bias) -> Tensor
+
+- func: fbgemm_linear_int8_weight(Tensor input, Tensor weight, Tensor packed, Tensor col_offsets, Scalar weight_scale, Scalar weight_zero_point, Tensor bias) -> Tensor
+
+- func: fbgemm_linear_quantize_weight(Tensor input) -> (Tensor, Tensor, float, int)
+
+- func: fbgemm_pack_gemm_matrix_fp16(Tensor input) -> Tensor
+
+- func: _wrapped_linear_prepack(Tensor weight, Tensor weight_scale, Tensor weight_zero_point, Tensor bias) -> Tensor
+
+- func: _wrapped_quantized_linear_prepacked(Tensor input, Tensor input_scale, Tensor input_zero_point, Tensor packed_weight, Tensor output_scale, Tensor output_zero_point, int out_channel) -> Tensor
+
+- func: fbgemm_linear_fp16_weight_fp32_activation(Tensor input, Tensor packed_weight, Tensor? bias) -> Tensor
+
+- func: fbgemm_linear_fp16_weight_fp32_activation.out(Tensor input, Tensor packed_weight, Tensor? bias, Tensor(a!) output) -> Tensor
+
+- func: fbgemm_linear_fp16_weight(Tensor input, Tensor packed_weight, Tensor bias) -> Tensor
+
+- func: fbgemm_linear_fp16_weight.out(Tensor input, Tensor packed_weight, Tensor bias, Tensor(a!) output) -> Tensor
+
+- func: fbgemm_pack_quantized_matrix(Tensor input) -> Tensor
+
+- func: fbgemm_pack_quantized_matrix.KN(Tensor input, int K, int N) -> Tensor
+
+- func: ldexp.Tensor(Tensor self, Tensor other) -> Tensor
+  variants: function, method
+
+- func: ldexp_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  variants: function, method
+  tags: pointwise
+
+- func: ldexp.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  tags: pointwise
+
+- func: linspace(Scalar start, Scalar end, int steps, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: linspace
+
+- func: linspace.Tensor_Tensor(Tensor start, Tensor end, int steps, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: linspace
+
+- func: linspace.Tensor_Scalar(Tensor start, Scalar end, int steps, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: linspace
+
+- func: linspace.Scalar_Tensor(Scalar start, Tensor end, int steps, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: linspace
+
+- func: linspace.out(Scalar start, Scalar end, int steps, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, Meta: linspace_out
+    CUDA: linspace_cuda_out
+    MPS: linspace_out_mps
+
+- func: linspace.Tensor_Tensor_out(Tensor start, Tensor end, int steps, *, Tensor(a!) out) -> Tensor(a!)
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: linspace_out
+
+- func: linspace.Tensor_Scalar_out(Tensor start, Scalar end, int steps, *, Tensor(a!) out) -> Tensor(a!)
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: linspace_out
+
+- func: linspace.Scalar_Tensor_out(Scalar start, Tensor end, int steps, *, Tensor(a!) out) -> Tensor(a!)
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: linspace_out
+
+- func: log(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: log.out
+  variants: function, method
+  tags: [core, pointwise]
+
+- func: log_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: log.out
+  variants: function, method
+  tags: pointwise
+
+- func: log.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: log_out
+  tags: pointwise
+
+- func: log10(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: log10.out
+  variants: function, method
+  tags: [core, pointwise]
+
+- func: log10_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: log10.out
+  variants: function, method
+  tags: pointwise
+
+- func: log10.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: log10_out
+  tags: pointwise
+
+- func: log1p(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: log1p.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: log1p_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: log1p_sparse_csr
+  tags: [core, pointwise]
+
+- func: log1p_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: log1p.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: log1p_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: log1p_sparse_csr_
+  tags: pointwise
+
+- func: log1p.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: log1p_out
+    SparseCPU, SparseCUDA, SparseMPS: log1p_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: log1p_sparse_csr_out
+  tags: pointwise
+
+- func: log2(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: log2.out
+  variants: function, method
+  tags: [core, pointwise]
+
+- func: log2_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: log2.out
+  variants: function, method
+  tags: pointwise
+
+- func: log2.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: log2_out
+  tags: pointwise
+
+- func: logaddexp.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: logaddexp_out
+  tags: pointwise
+
+- func: logaddexp(Tensor self, Tensor other) -> Tensor
+  variants: method, function
+  structured_delegate: logaddexp.out
+  tags: pointwise
+
+- func: logaddexp2.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: logaddexp2_out
+  tags: pointwise
+
+- func: logaddexp2(Tensor self, Tensor other) -> Tensor
+  variants: method, function
+  structured_delegate: logaddexp2.out
+  tags: pointwise
+
+- func: xlogy.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: xlogy.OutTensor
+  variants: function, method
+  tags: pointwise
+
+- func: xlogy.Scalar_Self(Scalar self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: xlogy
+  tags: pointwise
+
+- func: xlogy.Scalar_Other(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: xlogy
+  tags: pointwise
+
+# xlogy: inplace variant
+- func: xlogy_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: xlogy.OutTensor
+  tags: pointwise
+
+- func: xlogy_.Scalar_Other(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: xlogy_
+
+# xlogy: out variant
+- func: xlogy.OutTensor(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  variants: function
+  dispatch:
+    CPU, CUDA: xlogy_out
+    MPS: xlogy_out_mps
+  tags: pointwise
+
+- func: xlogy.OutScalar_Self(Scalar self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: xlogy_out
+  tags: pointwise
+
+- func: xlogy.OutScalar_Other(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: xlogy_out
+  tags: pointwise
+
+- func: logspace(Scalar start, Scalar end, int steps, float base=10.0, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: logspace
+
+- func: logspace.Tensor_Tensor(Tensor start, Tensor end, int steps, float base=10.0, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: logspace
+
+- func: logspace.Tensor_Scalar(Tensor start, Scalar end, int steps, float base=10.0, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: logspace
+
+- func: logspace.Scalar_Tensor(Scalar start, Tensor end, int steps, float base=10.0, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: logspace
+
+- func: logspace.out(Scalar start, Scalar end, int steps, float base=10.0, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, Meta: logspace_out
+    CUDA: logspace_cuda_out
+
+- func: logspace.Tensor_Tensor_out(Tensor start, Tensor end, int steps, float base=10.0, *, Tensor(a!) out) -> Tensor(a!)
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: logspace_out
+
+- func: logspace.Tensor_Scalar_out(Tensor start, Scalar end, int steps, float base=10.0, *, Tensor(a!) out) -> Tensor(a!)
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: logspace_out
+
+- func: logspace.Scalar_Tensor_out(Scalar start, Tensor end, int steps, float base=10.0, *, Tensor(a!) out) -> Tensor(a!)
+  category_override: factory
+  dispatch:
+    CompositeExplicitAutograd: logspace_out
+
+# log_softmax allows positional dtype, unlike most operators, because kwonly is BC-breaking when loading jit models.
+- func: log_softmax.int(Tensor self, int dim, ScalarType? dtype=None) -> Tensor
+  variants: function, method
+
+- func: log_softmax.int_out(Tensor self, int dim, ScalarType? dtype=None, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: log_softmax_out
+
+- func: log_softmax.Dimname(Tensor self, Dimname dim, *, ScalarType? dtype=None) -> Tensor
+  variants: function, method
+
+- func: _log_softmax(Tensor self, int dim, bool half_to_float) -> Tensor
+  structured_delegate: _log_softmax.out
+  tags: core
+
+- func: _log_softmax.out(Tensor self, int dim, bool half_to_float, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: log_softmax_cpu_out
+    CUDA: log_softmax_cuda_out
+    MTIA: log_softmax_mtia_out
+    MPS: log_softmax_mps_out
+
+- func: _log_softmax_backward_data(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype) -> Tensor
+  structured_delegate: _log_softmax_backward_data.out
+
+- func: _log_softmax_backward_data.out(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: log_softmax_backward_cpu_out
+    CUDA: log_softmax_backward_cuda_out
+    MTIA: log_softmax_backward_mtia_out
+    MPS: log_softmax_backward_mps_out
+
+- func: _logcumsumexp(Tensor self, int dim) -> Tensor
+  dispatch:
+    CPU: _logcumsumexp_cpu
+    CUDA: _logcumsumexp_cuda
+    MPS: _logcumsumexp_mps
+
+- func: _logcumsumexp.out(Tensor self, int dim, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU: _logcumsumexp_out_cpu
+    CUDA: _logcumsumexp_out_cuda
+    MPS: _logcumsumexp_out_mps
+
+- func: logcumsumexp(Tensor self, int dim) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: logcumsumexp
+
+- func: logcumsumexp.out(Tensor self, int dim, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: logcumsumexp_out
+
+- func: logcumsumexp.dimname(Tensor self, Dimname dim) -> Tensor
+  variants: function, method
+
+- func: logcumsumexp.dimname_out(Tensor self, Dimname dim, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: logsumexp(Tensor self, int[1] dim, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: logsumexp
+  tags: reduction
+
+- func: logsumexp.out(Tensor self, int[1] dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    # calls squeeze
+    CompositeExplicitAutogradNonFunctional: logsumexp_out
+  tags: reduction
+
+- func: logsumexp.names(Tensor self, Dimname[1] dim, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: logsumexp.names_out(Tensor self, Dimname[1] dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: reduction
+
+- func: margin_ranking_loss(Tensor input1, Tensor input2, Tensor target, float margin=0.0, int reduction=Mean) -> Tensor
+
+- func: matmul(Tensor self, Tensor other) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeImplicitAutograd: matmul
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: matmul_nested
+
+- func: matmul_backward(Tensor grad, Tensor self, Tensor other, bool[2] mask) -> (Tensor, Tensor)
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: matmul_backward_nested
+  autogen: matmul_backward.out
+
+- func: matmul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeImplicitAutograd: matmul_out
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: matmul_out_nested
+
+# Alias to linalg.matrix_power
+- func: matrix_power(Tensor self, int n) -> Tensor
+  variants: function, method
+
+# Alias to linalg.matrix_power
+- func: matrix_power.out(Tensor self, int n, *, Tensor(a!) out) -> Tensor(a!)
+
+# Alias to linalg.matrix_exp
+- func: matrix_exp(Tensor self) -> Tensor
+  variants: function, method
+
+# This function should be deprecated in favor of differential_analytic_matrix_function in FunctionsManual.cpp
+- func: matrix_exp_backward(Tensor self, Tensor grad) -> Tensor
+
+# DEPRECATED: Use torch.aminmax instead
+- func: _aminmax(Tensor self) -> (Tensor, Tensor)
+  dispatch:
+    CPU, CUDA: _aminmax_all
+  autogen: _aminmax.out
+
+# DEPRECATED: Use torch.aminmax instead
+- func: _aminmax.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor, Tensor)
+  dispatch:
+    CPU, CUDA: _aminmax
+  autogen: _aminmax.dim_out
+
+- func: aminmax(Tensor self, *, int? dim=None, bool keepdim=False) -> (Tensor min, Tensor max)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: aminmax.out
+  variants: function, method
+  tags: reduction
+
+- func: aminmax.out(Tensor self, *, int? dim=None, bool keepdim=False, Tensor(a!) min, Tensor(b!) max) -> (Tensor(a!) min, Tensor(b!) max)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  dispatch:
+    CPU, CUDA, MTIA: aminmax_out
+    MPS: aminmax_out_mps
+  tags: reduction
+
+- func: _compute_linear_combination(Tensor input, Tensor coefficients) -> Tensor
+  dispatch:
+    CPU, CUDA: _compute_linear_combination
+
+- func: _compute_linear_combination.out(Tensor input, Tensor coefficients, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: _compute_linear_combination_out
+
+- func: max.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: max.dim_max
+  variants: function, method
+  dispatch:
+    QuantizedCPU, QuantizedCUDA: qmax
+  tags: [core, reduction]
+
+- func: max.dim_max(Tensor self, int dim, bool keepdim=False, *, Tensor(a!) max, Tensor(b!) max_values) -> (Tensor(a!) values, Tensor(b!) indices)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  precomputed:
+  - dim -> int dim
+  dispatch:
+    CPU, CUDA, MTIA: max_out
+    MPS: max_out_mps
+  tags: reduction
+
+- func: max.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: max.names_dim_max(Tensor self, Dimname dim, bool keepdim=False, *, Tensor(a!) max, Tensor(b!) max_values) -> (Tensor(a!) values, Tensor(b!) indices)
+  device_check: NoCheck   # TensorIterator
+  tags: reduction
+
+- func: value_selecting_reduction_backward(Tensor grad, int dim, Tensor indices, SymInt[] sizes, bool keepdim) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: value_selecting_reduction_backward_symint
+    NestedTensorCPU, NestedTensorCUDA: value_selecting_reduction_backward_nested_symint
+
+- func: amax(Tensor self, int[1] dim=[], bool keepdim=False) -> Tensor
+  variants: function, method
+  structured_delegate: amax.out
+  tags: [core, reduction]
+
+- func: amax.out(Tensor self, int[1] dim=[], bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU, CUDA, MTIA: amax_out
+    MPS: amax_out_mps
+  tags: reduction
+
+# Return: (Tensor output, Tensor indices)
+- func: max_pool1d_with_indices(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=0, int[1] dilation=1, bool ceil_mode=False) -> (Tensor, Tensor)
+
+- func: max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=0, int[1] dilation=1, bool ceil_mode=False) -> Tensor
+
+- func: max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: max_pool2d
+    MPS: mps_max_pool2d
+
+- func: max_pool2d_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> Tensor
+  dispatch:
+    MPS: mps_max_pool2d_backward
+  autogen: max_pool2d_backward.out
+
+- func: mkldnn_max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> Tensor
+  dispatch:
+    MkldnnCPU: mkldnn_max_pool2d
+  autogen: mkldnn_max_pool2d.out
+
+- func: mkldnn_max_pool2d_backward(Tensor grad_output, Tensor output, Tensor input, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> Tensor
+  dispatch:
+    MkldnnCPU: mkldnn_max_pool2d_backward
+  autogen: mkldnn_max_pool2d_backward.out
+
+- func: mkldnn_max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, int[3] dilation=1, bool ceil_mode=False) -> Tensor
+  dispatch:
+    MkldnnCPU: mkldnn_max_pool3d
+  autogen: mkldnn_max_pool3d.out
+
+- func: mkldnn_max_pool3d_backward(Tensor grad_output, Tensor output, Tensor input, int[3] kernel_size, int[3] stride=[], int[3] padding=0, int[3] dilation=1, bool ceil_mode=False) -> Tensor
+  dispatch:
+    MkldnnCPU: mkldnn_max_pool3d_backward
+  autogen: mkldnn_max_pool3d_backward.out
+
+- func: quantized_max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=0, int[1] dilation=1, bool ceil_mode=False) -> Tensor
+  dispatch:
+    QuantizedCPU: quantized_max_pool1d
+  autogen: quantized_max_pool1d.out
+
+- func: quantized_max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> Tensor
+  dispatch:
+    QuantizedCPU: quantized_max_pool2d
+    QuantizedCUDA: quantized_max_pool2d_cudnn
+  autogen: quantized_max_pool2d.out
+
+- func: quantized_max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, int[3] dilation=1, bool ceil_mode=False) -> Tensor
+  dispatch:
+    QuantizedCPU: quantized_max_pool3d
+  autogen: quantized_max_pool3d.out
+
+- func: max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, int[3] dilation=1, bool ceil_mode=False) -> Tensor
+
+# The CPU and GPU dispatch variants are named weirdly here because otherwise there
+# are namespacing issues in C++
+- func: mean(Tensor self, *, ScalarType? dtype=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: mean
+  tags: [core, reduction]
+
+# For normal naming convention this should be `mean.out`. However since we already have `mean.out` we have to rename this.
+- func: mean.dtype_out(Tensor self, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CompositeExplicitAutograd: mean_dtype_out
+  tags: reduction
+
+- func: mean.dim(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  structured_delegate: mean.out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    QuantizedCPU: mean_quantized_cpu
+  tags: [core, reduction]
+
+- func: mean.out(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: mean_out
+    MPS: mean_out_mps
+    QuantizedCPU: mean_out_quantized_cpu
+  tags: reduction
+
+- func: mean.names_dim(Tensor self, Dimname[1] dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: mean.names_out(Tensor self, Dimname[1] dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: reduction
+
+- func: nanmean(Tensor self, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  device_check: NoCheck   # Composite
+  variants: function, method
+
+- func: nanmean.out(Tensor self, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # Composite
+
+- func: median(Tensor self) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU: median_cpu
+    CUDA: median_cuda
+    MPS: median_mps
+  autogen: median.out
+
+- func: median.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: median
+
+- func: median.dim_values(Tensor self, int dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  dispatch:
+    CPU: median_out_cpu
+    CUDA: median_out_cuda
+    MPS: median_out_mps
+
+- func: median.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  variants: function, method
+
+- func: median.names_dim_values(Tensor self, Dimname dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+
+- func: nanmedian(Tensor self) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU: nanmedian_cpu
+    CUDA: nanmedian_cuda
+    MPS: nanmedian_mps
+  autogen: nanmedian.out
+
+- func: nanmedian.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: nanmedian
+
+- func: nanmedian.dim_values(Tensor self, int dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  dispatch:
+    CPU: nanmedian_out_cpu
+    CUDA: nanmedian_out_cuda
+    MPS: nanmedian_out_mps
+
+- func: nanmedian.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  variants: function, method
+
+- func: nanmedian.names_dim_values(Tensor self, Dimname dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+
+- func: min.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: min.dim_min
+  variants: function, method
+  dispatch:
+    QuantizedCPU, QuantizedCUDA: qmin
+  tags: [core, reduction]
+
+- func: min.dim_min(Tensor self, int dim, bool keepdim=False, *, Tensor(a!) min, Tensor(b!) min_indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  precomputed:
+  - dim -> int dim
+  dispatch:
+    CPU, CUDA, MTIA: min_out
+    MPS: min_out_mps
+  tags: reduction
+
+- func: min.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: min.names_dim_min(Tensor self, Dimname dim, bool keepdim=False, *, Tensor(a!) min, Tensor(b!) min_indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  device_check: NoCheck   # TensorIterator
+  tags: reduction
+
+- func: amin(Tensor self, int[1] dim=[], bool keepdim=False) -> Tensor
+  variants: function, method
+  structured_delegate: amin.out
+  tags: [core, reduction]
+
+- func: amin.out(Tensor self, int[1] dim=[], bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU, CUDA, MTIA: amin_out
+    MPS: amin_out_mps
+  tags: reduction
+
+# TODO: Add this function to MPS dispatch key so that we avoid declaring it in
+# native_functions.yaml
+# https://github.com/pytorch/pytorch/issues/77394
+- func: _mps_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups) -> Tensor
+  dispatch:
+    MPS: _mps_convolution
+  autogen: _mps_convolution.out
+
+- func: mps_convolution_backward(Tensor self, Tensor grad_output, Tensor weight, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    MPS: mps_convolution_backward
+  autogen: mps_convolution_backward.out
+
+- func: mkldnn_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: mkldnn_convolution
+  autogen: mkldnn_convolution.out
+
+- func: mkldnn_rnn_layer(Tensor input, Tensor weight0, Tensor weight1, Tensor weight2, Tensor weight3, Tensor hx_, Tensor cx_, bool reverse, int[] batch_sizes, int mode, int hidden_size, int num_layers, bool has_biases, bool bidirectional, bool batch_first, bool train) -> (Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: mkldnn_rnn_layer
+    MkldnnCPU: mkldnn_rnn_layer
+  autogen: mkldnn_rnn_layer.out
+
+- func: mkldnn_rnn_layer_backward(Tensor input, Tensor weight1, Tensor weight2, Tensor weight3, Tensor weight4, Tensor hx_, Tensor cx_tmp, Tensor output, Tensor hy_, Tensor cy_, Tensor? grad_output, Tensor? grad_hy, Tensor? grad_cy, bool reverse, int mode, int hidden_size, int num_layers, bool has_biases, bool train, bool bidirectional, int[] batch_sizes, bool batch_first, Tensor workspace) -> (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: mkldnn_rnn_layer_backward
+  autogen: mkldnn_rnn_layer_backward.out
+
+- func: miopen_batch_norm(Tensor input, Tensor weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float exponential_average_factor, float epsilon) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: miopen_batch_norm
+  autogen: miopen_batch_norm.out
+
+- func: miopen_batch_norm_backward(Tensor input, Tensor grad_output, Tensor weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_var, float epsilon) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: miopen_batch_norm_backward
+  autogen: miopen_batch_norm_backward.out
+
+- func: miopen_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic) -> Tensor
+  dispatch:
+    CUDA: miopen_convolution
+  autogen: miopen_convolution.out
+
+- func: miopen_convolution_transpose(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] output_padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic) -> Tensor
+  dispatch:
+    CUDA: miopen_convolution_transpose
+  autogen: miopen_convolution_transpose.out
+
+- func: miopen_depthwise_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic) -> Tensor
+  dispatch:
+    CUDA: miopen_depthwise_convolution
+  autogen: miopen_depthwise_convolution.out
+
+- func: miopen_convolution_relu(Tensor self, Tensor weight, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, SymInt groups) -> Tensor
+  dispatch:
+    CUDA: miopen_convolution_relu
+
+- func: miopen_convolution_add_relu(Tensor self, Tensor weight, Tensor z, Scalar? alpha, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, SymInt groups) -> Tensor
+  dispatch:
+    CUDA: miopen_convolution_add_relu
+
+- func: miopen_rnn(Tensor input, Tensor[] weight, int weight_stride0, Tensor hx, Tensor? cx, int mode, int hidden_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, int[] batch_sizes, Tensor? dropout_state) -> (Tensor, Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: miopen_rnn
+  autogen: miopen_rnn.out
+  tags: nondeterministic_seeded
+
+
+- func: miopen_rnn_backward(Tensor input, Tensor[] weight, int weight_stride0, Tensor weight_buf, Tensor hx, Tensor? cx, Tensor output, Tensor? grad_output, Tensor? grad_hy, Tensor? grad_cy, int mode, int hidden_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, int[] batch_sizes, Tensor? dropout_state, Tensor reserve, bool[4] output_mask) -> (Tensor, Tensor, Tensor, Tensor[])
+  dispatch:
+    CUDA: miopen_rnn_backward
+  autogen: miopen_rnn_backward.out
+
+- func: mm(Tensor self, Tensor mat2) -> Tensor
+  structured_delegate: mm.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: _sparse_mm
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: _sparse_csr_mm
+  tags: core
+
+- func: mm.out(Tensor self, Tensor mat2, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: mm_out_cpu
+    CUDA: mm_out_cuda
+    MTIA: mm_out_mtia
+    MPS: mm_out_mps
+    XPU: mm_out_xpu
+    SparseCPU, SparseCUDA, SparseMPS: _sparse_mm_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: _sparse_csr_mm_out
+
+- func: mm.dtype(Tensor self, Tensor mat2, ScalarType out_dtype) -> Tensor
+  dispatch:
+    CUDA: _mm_dtype_cuda
+
+- func: mm.dtype_out(Tensor self, Tensor mat2, ScalarType out_dtype, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CUDA: _mm_dtype_out_cuda
+
+- func: _int_mm(Tensor self, Tensor mat2) -> Tensor
+  dispatch:
+    CPU: _int_mm_cpu
+    CUDA: _int_mm_cuda
+    XPU: _int_mm_xpu
+
+- func: _int_mm.out(Tensor self, Tensor mat2, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU: _int_mm_out_cpu
+    CUDA: _int_mm_out_cuda
+    XPU: _int_mm_out_xpu
+
+- func: _convert_weight_to_int4pack(Tensor self, int innerKTiles) -> Tensor
+  dispatch:
+    CUDA: _convert_weight_to_int4pack_cuda
+    MPS: _convert_weight_to_int4pack_mps
+
+- func: _weight_int4pack_mm(Tensor self, Tensor mat2, int qGroupSize, Tensor qScaleAndZeros) -> Tensor
+  dispatch:
+    MPS: _weight_int4pack_mm_mps
+    CUDA: _weight_int4pack_mm_cuda
+
+- func: _weight_int4pack_mm_with_scales_and_zeros(Tensor self, Tensor mat2, int qGroupSize, Tensor qScale, Tensor qZeros) -> Tensor
+  dispatch:
+    XPU: _weight_int4pack_mm_xpu
+
+# Split int4 pack weight between cpu and other devices due to
+# https://github.com/pytorch/ao/issues/1117#issuecomment-2451252756.
+- func: _convert_weight_to_int4pack_for_cpu(Tensor self, int innerKTiles) -> Tensor
+  dispatch:
+    CPU: _convert_weight_to_int4pack_cpu
+
+- func: _weight_int4pack_mm_for_cpu(Tensor self, Tensor mat2, int qGroupSize, Tensor qScaleAndZeros) -> Tensor
+  dispatch:
+    CPU: _weight_int4pack_mm_cpu
+
+- func: _dyn_quant_pack_4bit_weight(Tensor weights, Tensor scales_zeros, Tensor? bias, int block_size, int in_features, int out_features) -> Tensor
+  dispatch:
+    CPU: _dyn_quant_pack_4bit_weight_cpu
+
+- func: _dyn_quant_matmul_4bit(Tensor inp, Tensor packed_weights, int block_size, int in_features, int out_features) -> Tensor
+  dispatch:
+    CPU: _dyn_quant_matmul_4bit_cpu
+
+- func: _weight_int8pack_mm(Tensor self, Tensor mat2, Tensor scales) -> Tensor
+  dispatch:
+    CPU: _weight_int8pack_mm_cpu
+    CUDA: _weight_int8pack_mm_cuda
+    MPS: _weight_int8pack_mm_mps
+    XPU: _weight_int8pack_mm_xpu
+
+- func: _sparse_mm(Tensor sparse, Tensor dense) -> Tensor
+  python_module: sparse
+
+- func: _sparse_mm.reduce(Tensor sparse, Tensor dense, str reduce) -> Tensor
+  python_module: sparse
+
+- func: _sparse_sparse_matmul(Tensor self, Tensor other) -> Tensor
+  dispatch:
+    SparseCPU: sparse_sparse_matmul_cpu
+    SparseCUDA: sparse_sparse_matmul_cuda
+    SparseMPS: sparse_sparse_matmul_mps
+  autogen: _sparse_sparse_matmul.out
+
+- func: mode(Tensor self, int dim=-1, bool keepdim=False) -> (Tensor values, Tensor indices)
+  variants: function, method
+  dispatch:
+    CPU, CUDA: mode
+
+- func: mode.values(Tensor self, int dim=-1, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  dispatch:
+    CompositeExplicitAutograd: mode_out
+
+- func: mode.dimname(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  variants: function, method
+
+- func: mode.dimname_out(Tensor self, Dimname dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+
+- func: mul.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: mul.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: mul_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: mul_sparse_csr
+    MkldnnCPU: mkldnn_mul
+    ZeroTensor: mul_zerotensor
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_mul_Tensor
+  tags: [core, pointwise]
+
+- func: mul_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: mul.out
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: mul_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: mul_sparse_csr_
+    MkldnnCPU: mkldnn_mul_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_mul__Tensor
+  tags: pointwise
+
+- func: mul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: mul_out
+    SparseCPU: mul_out_sparse_cpu
+    SparseCUDA: mul_out_sparse_cuda
+    SparseMPS: mul_out_sparse_mps
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: mul_out_sparse_csr
+    MkldnnCPU: mkldnn_mul_out
+  tags: pointwise
+  # For C++ only, until we have conversion from C++ numbers to Tensor
+
+- func: mul.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: mul
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: mul_scalar_sparse_csr
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_mul_Scalar
+  tags: [core, pointwise]
+
+- func: mul_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: mul_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: mul__scalar_sparse_csr
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_mul__Scalar
+  autogen: mul.Scalar_out
+  tags: pointwise
+# multiply, alias for mul
+
+- func: multiply.Tensor(Tensor self, Tensor other) -> Tensor
+  variants: function, method
+
+- func: multiply_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  variants: method
+
+- func: multiply.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: multiply.Scalar(Tensor self, Scalar other) -> Tensor
+  variants: function, method
+
+- func: multiply_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  variants: method
+
+- func: mv(Tensor self, Tensor vec) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: mv
+    SparseCPU, SparseCUDA, SparseMPS: mv_sparse
+
+- func: mv.out(Tensor self, Tensor vec, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: mv_out
+
+- func: mvlgamma.out(Tensor self, int p, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: mvlgamma_out
+  tags: pointwise
+
+- func: mvlgamma(Tensor self, int p) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: mvlgamma
+  tags: pointwise
+
+- func: mvlgamma_(Tensor(a!) self, int p) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: mvlgamma_
+  tags: pointwise
+
+- func: narrow_copy(Tensor self, int dim, SymInt start, SymInt length) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU: narrow_copy_dense_cpu
+    SparseCPU, SparseCUDA, SparseMPS: narrow_copy_sparse
+    CompositeExplicitAutogradNonFunctional: narrow_copy_dense_symint
+  tags: view_copy
+
+- func: narrow_copy.out(Tensor self, int dim, SymInt start, SymInt length, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU: narrow_copy_dense_cpu_out
+
+- func: narrow(Tensor(a) self, int dim, SymInt start, SymInt length) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: narrow_symint
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: narrow_nested_symint
+
+- func: narrow.Tensor(Tensor(a) self, int dim, Tensor start, SymInt length) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: narrow_tensor_symint
+
+- func: native_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: batch_norm_cpu
+    CUDA: batch_norm_cuda
+    MPS: batch_norm_mps
+    MkldnnCPU: mkldnn_batch_norm
+
+- func: native_batch_norm.out(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps, *, Tensor(a!) out, Tensor(b!) save_mean, Tensor(c!) save_invstd) -> (Tensor(a!), Tensor(b!), Tensor(c!))
+  dispatch:
+    CUDA: batch_norm_cuda_out
+    MPS: batch_norm_mps_out
+    CPU: batch_norm_cpu_out
+
+# TODO: In 2 weeks, we should make native_batch_norm composite implicit so that this correct schema percolates correctly through our dispatching
+- func: _native_batch_norm_legit(Tensor input, Tensor? weight, Tensor? bias, Tensor(a!) running_mean, Tensor(b!) running_var, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: _batch_norm_legit_cpu
+    CUDA: _batch_norm_legit_cuda
+    MPS: _batch_norm_legit_mps
+    MkldnnCPU: _mkldnn_batch_norm_legit
+  autogen: _native_batch_norm_legit_functional
+  tags: core
+
+# HACK: identical to _native_batch_norm_legit, but training is known to be False,
+# So we known that running stats will not be mutated.
+# The real fix here is batch norm consolidation.
+- func: _native_batch_norm_legit_no_training(Tensor input, Tensor? weight, Tensor? bias, Tensor running_mean, Tensor running_var, float momentum, float eps) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CompositeExplicitAutograd: _batch_norm_legit_no_training
+  autogen: _native_batch_norm_legit_no_training.out
+  tags: core
+
+- func: _native_batch_norm_legit.out(Tensor input, Tensor? weight, Tensor? bias, Tensor(a!) running_mean, Tensor(b!) running_var, bool training, float momentum, float eps, *, Tensor(d!) out, Tensor(e!) save_mean, Tensor(f!) save_invstd) -> (Tensor(d!), Tensor(e!), Tensor(f!))
+  dispatch:
+    CPU: _batch_norm_legit_cpu_out
+    CUDA: _batch_norm_legit_cuda_out
+    MPS: _batch_norm_legit_mps_out
+
+- func: _native_batch_norm_legit.no_stats(Tensor input, Tensor? weight, Tensor? bias, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: _batch_norm_legit_no_stats_cpu
+    CUDA: _batch_norm_legit_no_stats_cuda
+    MPS: _batch_norm_legit_no_stats_mps
+    MkldnnCPU: _mkldnn_batch_norm_legit_no_stats
+  tags: core
+
+- func: _native_batch_norm_legit.no_stats_out(Tensor input, Tensor? weight, Tensor? bias, bool training, float momentum, float eps, *, Tensor(a!) out, Tensor(b!) save_mean, Tensor(c!) save_invstd) -> (Tensor(a!), Tensor(b!), Tensor(c!))
+  dispatch:
+    CPU: _batch_norm_legit_no_stats_cpu_out
+    CUDA: _batch_norm_legit_no_stats_cuda_out
+    MPS: _batch_norm_legit_no_stats_mps_out
+
+- func: batch_norm_stats(Tensor input, float eps) -> (Tensor, Tensor)
+  dispatch:
+    CUDA: batch_norm_stats_cuda
+  autogen: batch_norm_stats.out
+
+- func: batch_norm_elemt(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor invstd, float eps) -> Tensor
+  dispatch:
+    CUDA: batch_norm_elemt_cuda
+
+- func: batch_norm_elemt.out(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor invstd, float eps, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CUDA: batch_norm_elemt_cuda_out
+
+# for backward compatibility
+- func: batch_norm_gather_stats(Tensor input, Tensor mean, Tensor invstd, Tensor? running_mean, Tensor? running_var, float momentum, float eps, int count) -> (Tensor, Tensor)
+  dispatch:
+    CUDA: batch_norm_gather_stats_cuda
+  autogen: batch_norm_gather_stats.out
+
+- func: batch_norm_gather_stats_with_counts(Tensor input, Tensor mean, Tensor invstd, Tensor? running_mean, Tensor? running_var, float momentum, float eps, Tensor counts) -> (Tensor, Tensor)
+  dispatch:
+    CUDA: batch_norm_gather_stats_with_counts_cuda
+  autogen: batch_norm_gather_stats_with_counts.out
+
+- func: native_batch_norm_backward(Tensor grad_out, Tensor input, Tensor? weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_invstd, bool train, float eps, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: batch_norm_backward_cpu
+    CUDA: batch_norm_backward_cuda
+    MPS: batch_norm_backward_mps
+    MkldnnCPU: mkldnn_batch_norm_backward
+  autogen: native_batch_norm_backward.out
+
+- func: batch_norm_backward_reduce(Tensor grad_out, Tensor input, Tensor mean, Tensor invstd, Tensor? weight, bool input_g, bool weight_g, bool bias_g) -> (Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: batch_norm_backward_reduce_cuda
+  autogen: batch_norm_backward_reduce.out
+
+- func: batch_norm_backward_elemt(Tensor grad_out, Tensor input, Tensor mean, Tensor invstd, Tensor? weight, Tensor sum_dy, Tensor sum_dy_xmu, Tensor count) -> Tensor
+  dispatch:
+    CUDA: batch_norm_backward_elemt_cuda
+  autogen: batch_norm_backward_elemt.out
+
+- func: batch_norm_update_stats(Tensor input, Tensor? running_mean, Tensor? running_var, float momentum) -> (Tensor, Tensor)
+  dispatch:
+    CPU: batch_norm_update_stats_cpu
+    CUDA: batch_norm_update_stats_cuda
+  autogen: batch_norm_update_stats.out
+
+- func: is_vulkan_available() -> bool
+
+- func: _nnpack_available() -> bool
+
+- func: _nnpack_spatial_convolution(Tensor input, Tensor weight, Tensor? bias, SymInt[2] padding, SymInt[2] stride=1) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _nnpack_spatial_convolution
+  autogen: _nnpack_spatial_convolution.out
+
+- func: ones.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: ones
+  autogen: ones.names_out
+
+- func: ones(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: ones
+
+- func: ones.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: ones_out
+
+- func: ones_like(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: ones_like
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: ones_like
+  autogen: ones_like.out
+
+- func: pairwise_distance(Tensor x1, Tensor x2, float p=2, float eps=1e-06, bool keepdim=False) -> Tensor
+
+- func: cdist(Tensor x1, Tensor x2, float p=2, int? compute_mode=None) -> Tensor
+
+- func: _euclidean_dist(Tensor x1, Tensor x2) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _euclidean_dist
+  autogen: _euclidean_dist.out
+
+- func: _cdist_forward(Tensor x1, Tensor x2, float p, int? compute_mode) -> Tensor
+  dispatch:
+    CPU, CUDA: _cdist_forward
+    MTIA: _cdist_forward_mtia
+    MPS: _cdist_forward_mps
+  autogen: _cdist_forward.out
+  tags: core
+
+- func: _cdist_backward(Tensor grad, Tensor x1, Tensor x2, float p, Tensor cdist) -> Tensor
+  dispatch:
+    CPU, CUDA: _cdist_backward
+  autogen: _cdist_backward.out
+
+- func: pdist(Tensor self, float p=2) -> Tensor
+
+- func: _pdist_forward(Tensor self, float p=2) -> Tensor
+  dispatch:
+    CPU, CUDA: _pdist_forward
+  autogen: _pdist_forward.out
+  tags: core
+
+- func: _pdist_backward(Tensor grad, Tensor self, float p, Tensor pdist) -> Tensor
+  dispatch:
+    CPU, CUDA: _pdist_backward
+  autogen: _pdist_backward.out
+
+- func: cosine_similarity(Tensor x1, Tensor x2, int dim=1, float eps=1e-08) -> Tensor
+  variants: function
+
+- func: permute(Tensor(a) self, int[] dims) -> Tensor(a)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: permute
+    MPS: permute_mps
+    SparseCPU, SparseCUDA, SparseMPS: permute_sparse_coo
+  tags: core
+
+- func: movedim.intlist(Tensor(a) self, int[] source, int[] destination) -> Tensor(a)
+  variants: function, method
+
+- func: movedim.int(Tensor(a) self, int source, int destination) -> Tensor(a)
+  variants: function, method
+
+# moveaxis, alias for movedim
+- func: moveaxis.intlist(Tensor(a) self, int[] source, int[] destination) -> Tensor(a)
+  variants: function, method
+
+- func: moveaxis.int(Tensor(a) self, int source, int destination) -> Tensor(a)
+  variants: function, method
+
+# Only exposed from C++ -- in Python,
+# we expose it as an attribute `T`, not a function.
+#
+# I'd like to name this "T" in C++ too, but
+# calling a native function "T" causes undefined
+# behavior on Windows, for reasons I don't understand
+# (maybe related to capital letter collation somehow...)
+- func: numpy_T(Tensor(a) self) -> Tensor(a)
+  variants: method
+
+# Exposed on Python as an attribute 'H'
+- func: matrix_H(Tensor(a) self) -> Tensor(a)
+  variants: method
+
+# Exposed on Python as an attribute 'mT'
+- func: mT(Tensor(a) self) -> Tensor(a)
+  variants: method
+
+# Exposed on Python as an attribute 'mH'
+- func: mH(Tensor(a) self) -> Tensor(a)
+  variants: method
+
+- func: adjoint(Tensor(a) self) -> Tensor(a)
+  variants: function, method
+
+- func: pixel_shuffle(Tensor self, int upscale_factor) -> Tensor
+  dispatch:
+    CPU: pixel_shuffle_cpu
+    MPS: pixel_shuffle_mps
+    CompositeExplicitAutogradNonFunctional: math_pixel_shuffle
+  autogen: pixel_shuffle.out
+
+- func: pixel_unshuffle(Tensor self, int downscale_factor) -> Tensor
+  dispatch:
+    CPU: pixel_unshuffle_cpu
+    MPS: pixel_unshuffle_mps
+    CompositeExplicitAutogradNonFunctional: math_pixel_unshuffle
+  autogen: pixel_unshuffle.out
+
+- func: channel_shuffle(Tensor self, SymInt groups) -> Tensor
+  dispatch:
+    CPU, CUDA: channel_shuffle
+    QuantizedCPU: channel_shuffle_quantized_cpu
+  autogen: channel_shuffle.out
+
+- func: native_channel_shuffle(Tensor self, SymInt groups) -> Tensor
+  dispatch:
+    CPU: channel_shuffle_cpu
+    CompositeImplicitAutograd: math_channel_shuffle
+
+- func: is_pinned(Tensor self, Device? device=None) -> bool
+  variants: method
+  dispatch:
+    # the NestedTensor keys are necessary because NestedTensor has been removed
+    # from the CompositeExplicitAutograd keyset see Note [NestedTensor Not Included in Backend Keys]
+    CompositeExplicitAutograd, NestedTensorCPU: is_pinned
+    SparseCsrCPU: is_pinned_sparse_compressed
+    SparseCPU: is_pinned_sparse_coo
+
+# TODO: add a copy kwarg that guarantees that the tensor is put into fresh
+# pinned memory
+- func: pin_memory(Tensor(a) self, Device? device=None) -> Tensor(a)
+  variants: method
+
+# Unlike pin_memory, this is guaranteed to give a new non-aliasing tensor
+- func: _pin_memory(Tensor self, Device? device=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _pin_memory
+    NestedTensorCPU: _pin_memory_nested
+    SparseCPU: _pin_memory_sparse_coo
+    SparseCsrCPU: _pin_memory_sparse_compressed
+  autogen: _pin_memory.out
+
+- func: pinverse(Tensor self, float rcond=1e-15) -> Tensor
+  variants: function, method
+
+- func: poisson_nll_loss(Tensor input, Tensor target, bool log_input, bool full, float eps, int reduction) -> Tensor
+  variants: function
+
+- func: rad2deg(Tensor self) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: rad2deg
+    SparseCPU, SparseCUDA, SparseMPS: rad2deg_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: rad2deg_sparse_csr
+  tags: pointwise
+
+- func: rad2deg_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: rad2deg_
+    SparseCPU, SparseCUDA, SparseMPS: rad2deg_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: rad2deg_sparse_csr_
+  tags: pointwise
+
+- func: rad2deg.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: rad2deg_out
+    SparseCPU, SparseCUDA, SparseMPS: rad2deg_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: rad2deg_sparse_csr_out
+  tags: pointwise
+
+- func: deg2rad(Tensor self) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: deg2rad
+    SparseCPU, SparseCUDA, SparseMPS: deg2rad_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: deg2rad_sparse_csr
+  tags: pointwise
+
+- func: deg2rad_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: deg2rad_
+    SparseCPU, SparseCUDA, SparseMPS: deg2rad_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: deg2rad_sparse_csr_
+  tags: pointwise
+
+- func: deg2rad.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: deg2rad_out
+    SparseCPU, SparseCUDA, SparseMPS: deg2rad_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: deg2rad_sparse_csr_out
+  tags: pointwise
+
+- func: scalar_tensor(Scalar s, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: scalar_tensor
+  autogen: scalar_tensor.out
+  tags: core
+
+- func: rand.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: rand
+  autogen: rand.names_out
+  tags: nondeterministic_seeded
+
+- func: rand.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  device_check: NoCheck
+  device_guard: False
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: rand
+  autogen: rand.generator_with_names_out
+
+- func: rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: [core, nondeterministic_seeded]
+  dispatch:
+    CompositeExplicitAutograd: rand
+
+- func: rand.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: rand
+
+- func: rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: rand_out
+
+- func: rand.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+
+- func: rand_like(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: rand_like
+  autogen: rand_like.out
+
+- func: rand_like.generator(Tensor self, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: rand_like
+  autogen: rand_like.generator_out
+
+- func: randint(SymInt high, SymInt[] size, *, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randint
+
+- func: randint.generator(SymInt high, SymInt[] size, *, Generator? generator, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randint
+
+- func: randint.low(SymInt low, SymInt high, SymInt[] size, *, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randint
+
+- func: randint.low_generator(SymInt low, SymInt high, SymInt[] size, *, Generator? generator, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randint
+
+- func: randint.out(SymInt high, SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randint_out
+
+- func: randint.generator_out(SymInt high, SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randint_out
+
+- func: randint.low_out(SymInt low, SymInt high, SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randint_out
+
+- func: randint.low_generator_out(SymInt low, SymInt high, SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randint_out
+
+- func: randint_like(Tensor self, SymInt high, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: randint_like
+  autogen: randint_like.out
+
+- func: randint_like.generator(Tensor self, SymInt high, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: randint_like
+  autogen: randint_like.generator_out
+
+- func: randint_like.Tensor(Tensor self, Tensor high, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: randint_like
+  autogen: randint_like.Tensor_out
+
+- func: randint_like.Tensor_generator(Tensor self, Tensor high, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: randint_like
+  autogen: randint_like.Tensor_generator_out
+
+- func: randint_like.low_dtype(Tensor self, SymInt low, SymInt high, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: randint_like
+  autogen: randint_like.low_dtype_out
+
+- func: randint_like.low_generator_dtype(Tensor self, SymInt low, SymInt high, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd: randint_like
+  autogen: randint_like.low_generator_dtype_out
+
+- func: randn(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: [core, nondeterministic_seeded]
+  dispatch:
+    CompositeExplicitAutograd: randn
+
+- func: randn.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randn
+
+- func: randn.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: nondeterministic_seeded
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: randn
+  autogen: randn.names_out
+
+- func: randn.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: nondeterministic_seeded
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: randn
+  autogen: randn.generator_with_names_out
+
+- func: randn.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+
+- func: randn.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+
+- func: randn_like(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd, CompositeImplicitAutogradNestedTensor: randn_like
+  autogen: randn_like.out
+
+- func: randn_like.generator(Tensor self, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd, CompositeImplicitAutogradNestedTensor: randn_like
+  autogen: randn_like.generator_out
+
+- func: randperm(SymInt n, *, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: [core, nondeterministic_seeded]
+  dispatch:
+    CompositeExplicitAutograd: randperm
+
+- func: randperm.generator(SymInt n, *, Generator? generator, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randperm
+
+- func: randperm.out(SymInt n, *, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: randperm_out
+
+- func: randperm.generator_out(SymInt n, *, Generator? generator, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+  dispatch:
+    CPU: randperm_out_cpu
+    CUDA: randperm_out_cuda
+    MPS: randperm_out_mps
+
+- func: range.step(Scalar start, Scalar end, Scalar step=1, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: range
+
+- func: range(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: range
+
+- func: range.out_(Scalar start, Scalar end, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: range_out_no_step
+
+- func: range.out(Scalar start, Scalar end, Scalar step=1, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, Meta: range_out
+    CUDA: range_cuda_out
+    MPS: range_mps_out
+  cpp_no_default_args: ['step']
+
+- func: ravel(Tensor(a) self) -> Tensor(a)
+  variants: function, method
+
+- func: reciprocal(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: reciprocal.out
+  variants: function, method
+  tags: [core, pointwise]
+
+- func: reciprocal_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: reciprocal.out
+  variants: function, method
+  tags: pointwise
+
+- func: reciprocal.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MTIA: reciprocal_out
+    MPS: reciprocal_out_mps
+  tags: pointwise
+
+- func: neg(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: neg.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: neg_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: neg_sparse_csr
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_neg
+  tags: [core, pointwise]
+
+- func: neg_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: neg.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: neg_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: neg_sparse_csr_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_neg_
+  tags: pointwise
+
+- func: neg.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: neg_out
+    SparseCPU, SparseCUDA, SparseMPS: neg_out_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: neg_sparse_csr_out
+  tags: pointwise
+# Alias for neg
+
+- func: negative(Tensor self) -> Tensor
+  variants: function, method
+
+- func: negative_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+
+- func: negative.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: repeat(Tensor self, SymInt[] repeats) -> Tensor
+  variants: method  # This is method-only to match the previous tensor API. In the future we could make this a function too.
+  dispatch:
+    CompositeExplicitAutograd: repeat
+    MPS: repeat_mps
+  autogen: repeat.out
+  tags: core
+
+- func: repeat_interleave.Tensor(Tensor repeats, *, SymInt? output_size=None) -> Tensor
+  variants: function
+  dispatch:
+    CPU: repeat_interleave_cpu
+    CUDA: repeat_interleave_cuda
+    MPS: repeat_interleave_mps
+  tags: dynamic_output_shape
+  autogen: repeat_interleave.Tensor_out
+
+- func: repeat_interleave.self_Tensor(Tensor self, Tensor repeats, int? dim=None, *, SymInt? output_size=None) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeImplicitAutograd: repeat_interleave_symint
+
+- func: repeat_interleave.self_int(Tensor self, SymInt repeats, int? dim=None, *, SymInt? output_size=None) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeImplicitAutograd: repeat_interleave_symint
+
+- func: reshape(Tensor(a) self, SymInt[] shape) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: reshape_symint
+    CompositeImplicitAutogradNestedTensor: reshape_nested_symint
+
+- func: _reshape_copy(Tensor self, SymInt[] size) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _reshape_copy_symint
+
+# NOTE [ _reshape_alias ] is meant to be used in the implementation of reshape.
+# They are not user-facing, hence the leading underscore. Please don't use it
+# anywhere else.
+- func: _reshape_alias(Tensor(a) self, SymInt[] size, SymInt[] stride) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CPU, CUDA, Meta, QuantizedCPU, QuantizedCUDA, ZeroTensor, MPS, MTIA: _reshape_alias
+    # We don't need to support mkldnn since this is handled explicitly by the reshape operator.
+
+- func: _mkldnn_reshape(Tensor self, int[] shape) -> Tensor
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    MkldnnCPU: mkldnn_reshape
+  autogen: _mkldnn_reshape.out
+
+- func: reshape_as(Tensor(a) self, Tensor other) -> Tensor(a)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: reshape_as
+    CompositeImplicitAutogradNestedTensor: reshape_as_nested
+
+- func: round(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: round.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: round_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: round_sparse_csr
+  tags: [core, pointwise]
+
+- func: round_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: round.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: round_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: round_sparse_csr_
+  tags: pointwise
+
+- func: round.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: round_out
+    SparseCPU, SparseCUDA, SparseMPS: round_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: round_sparse_csr_out
+  tags: pointwise
+
+- func: round.decimals(Tensor self, *, int decimals) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: round.decimals_out
+  variants: function, method
+  tags: pointwise
+
+- func: round_.decimals(Tensor(a!) self, *, int decimals) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: round.decimals_out
+  variants: function, method
+  tags: pointwise
+
+- func: round.decimals_out(Tensor self, *, int decimals, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: round_decimals_out
+  tags: pointwise
+
+- func: rrelu(Tensor self, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  tags: [pointwise, nondeterministic_seeded]
+
+- func: rrelu_(Tensor(a!) self, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None) -> Tensor(a!)
+  tags: nondeterministic_seeded
+  device_check: NoCheck   # TensorIterator
+
+- func: relu(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CPU, CUDA: relu
+    MPS: relu_mps
+    MTIA: relu_mtia
+    MkldnnCPU: mkldnn_relu
+    QuantizedCPU: relu_quantized_cpu
+    QuantizedCUDA: relu_quantized_cuda
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_relu
+    SparseCPU, SparseCUDA, SparseMPS: relu_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: relu_sparse_csr
+  tags: [core, pointwise]
+
+- func: relu_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CPU, CUDA: relu_
+    MPS: relu_mps_
+    MTIA: relu_mtia_
+    MkldnnCPU: mkldnn_relu_
+    QuantizedCPU: relu_quantized_cpu_
+    QuantizedCUDA: relu_quantized_cuda_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_relu_
+    SparseCPU, SparseCUDA, SparseMPS: relu_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: relu_sparse_csr_
+  autogen: relu.out
+  tags: pointwise
+
+- func: relu6(Tensor self) -> Tensor
+  python_module: nn
+  tags: pointwise
+
+- func: relu6_(Tensor(a!) self) -> Tensor(a!)
+  python_module: nn
+
+- func: prelu(Tensor self, Tensor weight) -> Tensor
+  variants: function, method
+  autogen: prelu.out
+
+- func: _prelu_kernel(Tensor self, Tensor weight) -> Tensor
+  dispatch:
+    CPU, CUDA: _prelu_kernel
+    QuantizedCPU: _prelu_kernel_quantized_cpu
+    MkldnnCPU: mkldnn_prelu
+    MPS: prelu_mps
+
+- func: _prelu_kernel_backward(Tensor grad_output, Tensor self, Tensor weight) -> (Tensor, Tensor)
+  dispatch:
+    CPU, CUDA: _prelu_kernel_backward
+    MkldnnCPU: mkldnn_prelu_backward
+    MPS: prelu_backward_mps
+
+- func: gelu.out(Tensor self, *, str approximate='none', Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU: gelu_out_cpu
+    CUDA: gelu_out_cuda
+    MPS: gelu_out_mps
+
+- func: gelu_(Tensor(a!) self, *, str approximate='none') -> Tensor(a!)
+  structured_delegate: gelu.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    QuantizedCPU: gelu_quantized_cpu_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_gelu_
+
+- func: gelu(Tensor self, *, str approximate='none') -> Tensor
+  structured_delegate: gelu.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    MkldnnCPU: mkldnn_gelu
+    QuantizedCPU: gelu_quantized_cpu
+    QuantizedCUDA: gelu_quantized_cuda
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_gelu
+  tags: [core, pointwise]
+
+- func: gelu_backward.grad_input(Tensor grad_output, Tensor self, *, str approximate='none', Tensor(a!) grad_input) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU: gelu_backward_out_cpu
+    CUDA: gelu_backward_out_cuda
+    MPS: gelu_backward_out_mps
+
+- func: gelu_backward(Tensor grad_output, Tensor self, *, str approximate='none') -> Tensor
+  structured_delegate: gelu_backward.grad_input
+  python_module: nn
+  dispatch:
+    MkldnnCPU: mkldnn_gelu_backward
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: gelu_backwards_nested
+  tags: pointwise
+
+- func: infinitely_differentiable_gelu_backward(Tensor grad, Tensor self) -> Tensor
+  variants: function
+  python_module: nn
+  device_check: NoCheck
+  device_guard: False
+
+- func: hardshrink.out(Tensor self, Scalar lambd=0.5, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MPS: hardshrink_out
+
+- func: hardshrink(Tensor self, Scalar lambd=0.5) -> Tensor
+  structured_delegate: hardshrink.out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: pointwise
+
+- func: hardshrink_backward.grad_input(Tensor grad_out, Tensor self, Scalar lambd, *, Tensor(a!) grad_input) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: hardshrink_backward_out
+
+- func: hardshrink_backward(Tensor grad_out, Tensor self, Scalar lambd) -> Tensor
+  structured_delegate: hardshrink_backward.grad_input
+  variants: function, method
+
+- func: rsqrt(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: rsqrt.out
+  variants: function, method
+  tags: [core, pointwise]
+
+- func: rsqrt_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: rsqrt.out
+  variants: function, method
+  tags: pointwise
+
+- func: rsqrt.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: rsqrt_out
+  tags: pointwise
+
+- func: select.Dimname(Tensor(a) self, Dimname dim, int index) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+
+- func: select.int(Tensor(a) self, int dim, SymInt index) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: select_symint
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: select_sparse_csr
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: select_nested
+  tags: core
+
+- func: select_backward(Tensor grad_output, SymInt[] input_sizes, int dim, SymInt index) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: select_backward_symint
+  autogen: select_backward.out
+
+- func: _nested_select_backward(Tensor grad_output, Tensor self, int dim, SymInt index) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: _nested_select_backward_symint
+
+- func: selu(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  tags: pointwise
+
+- func: selu_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+
+- func: celu(Tensor self, Scalar alpha=1.0) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CompositeExplicitAutograd: celu
+  tags: pointwise
+
+- func: celu_(Tensor(a!) self, Scalar alpha=1.0) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CompositeExplicitAutograd: celu_
+  autogen: celu.out
+
+- func: silu(Tensor self) -> Tensor
+  structured_delegate: silu.out
+  python_module: nn
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_silu
+  tags: pointwise
+
+- func: silu_(Tensor(a!) self) -> Tensor(a!)
+  structured_delegate: silu.out
+  python_module: nn
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_silu_
+  tags: pointwise
+
+- func: silu.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MTIA: silu_out
+    MPS: silu_out_mps
+  tags: pointwise
+
+- func: silu_backward.grad_input(Tensor grad_output, Tensor self, *, Tensor(a!) grad_input) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA: silu_backward_out
+    MPS: silu_backward_out_mps
+  tags: pointwise
+
+- func: silu_backward(Tensor grad_output, Tensor self) -> Tensor
+  structured_delegate: silu_backward.grad_input
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: math_silu_backward
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: silu_backward_nested
+  tags: pointwise
+
+- func: mish(Tensor self) -> Tensor
+  structured_delegate: mish.out
+  python_module: nn
+  tags: pointwise
+
+- func: mish_(Tensor(a!) self) -> Tensor(a!)
+  structured_delegate: mish.out
+  python_module: nn
+
+- func: mish.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA: mish_out
+    MPS: mish_out_mps
+
+- func: mish_backward(Tensor grad_output, Tensor self) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU, CUDA: mish_backward
+    MPS: mish_backward_mps
+    CompositeImplicitAutograd: math_mish_backward
+
+- func: sigmoid(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: sigmoid.out
+  variants: function, method
+  dispatch:
+    QuantizedCPU: sigmoid_quantized_cpu
+    MkldnnCPU: mkldnn_sigmoid
+  tags: [core, pointwise]
+
+- func: sigmoid_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: sigmoid.out
+  variants: function, method
+  dispatch:
+    MkldnnCPU: mkldnn_sigmoid_
+  tags: pointwise
+
+- func: sigmoid.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: sigmoid_out
+  tags: pointwise
+
+- func: logit(Tensor self, float? eps=None) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU, CUDA, MTIA: logit
+    MPS: logit_mps
+  tags: pointwise
+
+- func: logit_(Tensor(a!) self, float? eps=None) -> Tensor(a!)
+  variants: function, method
+  dispatch:
+    CPU, CUDA: logit_
+  tags: pointwise
+
+- func: logit.out(Tensor self, float? eps=None, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: logit_out
+    MPS: logit_out_mps
+  tags: pointwise
+
+- func: sin(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: sin.out
+  variants: function, method
+  dispatch:
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sin_sparse_csr
+    SparseCPU, SparseCUDA, SparseMPS: sin_sparse
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_sin
+  tags: [core, pointwise]
+
+- func: sin_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: sin.out
+  variants: function, method
+  dispatch:
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sin_sparse_csr_
+    SparseCPU, SparseCUDA, SparseMPS: sin_sparse_
+  tags: pointwise
+
+- func: sin.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: sin_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sin_sparse_csr_out
+    SparseCPU, SparseCUDA, SparseMPS: sin_sparse_out
+  tags: pointwise
+
+- func: sinc(Tensor self) -> Tensor
+  structured_delegate: sinc.out
+  variants: function, method
+  tags: pointwise
+
+- func: sinc_(Tensor(a!) self) -> Tensor(a!)
+  structured_delegate: sinc.out
+  variants: function, method
+  tags: pointwise
+
+- func: sinc.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: sinc_out
+  tags: pointwise
+
+- func: sinh(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: sinh.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sinh_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sinh_sparse_csr
+  tags: [core, pointwise]
+
+- func: sinh_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: sinh.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sinh_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sinh_sparse_csr_
+  tags: pointwise
+
+- func: sinh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: sinh_out
+    SparseCPU, SparseCUDA, SparseMPS: sinh_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sinh_sparse_csr_out
+
+# Returns a copy of this `Variable` that is detached from its autograd graph.
+# This method is OK to call if the `Variable` is a view.
+#
+# NOTE: Previously, if we change the tensor metadata (e.g. sizes / strides /
+# storage / storage_offset) of a tensor created from `detach()`, those metadata
+# in the original tensor will also be updated. However, the new behavior is that
+# those metadata changes to the detached tensor will not update the original tensor
+# anymore, and in the `detach()` function we need to set `allow_tensor_metadata_change_`
+# to false to make such changes explicitly illegal, in order to prevent users from
+# changing metadata of the detached tensor and expecting the original tensor to also
+# be updated.
+  tags: pointwise
+- func: detach(Tensor(a) self) -> Tensor(a)
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: detach
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: detach
+
+# Like `detach()`, but modifies this `Variable` in-place. This method may
+# only be called on non-view `Variable`s. You can use `is_view()` to check
+# this. If this `Variable` is a view, throws an `std::runtime_error()`.
+- func: detach_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+  tags: inplace_view
+  dispatch:
+    CompositeExplicitAutograd: detach_
+
+- func: size.int(Tensor self, int dim) -> int
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  manual_cpp_binding: True
+
+- func: size.Dimname(Tensor self, Dimname dim) -> int
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+
+- func: sym_size.int(Tensor self, int dim) -> SymInt
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  tags: core
+  manual_cpp_binding: True
+
+- func: sym_is_contiguous(Tensor self, MemoryFormat memory_format=contiguous_format) -> SymBool
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  tags: core
+  manual_cpp_binding: True
+
+- func: sym_numel(Tensor self) -> SymInt
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  tags: core
+  manual_cpp_binding: True
+
+- func: sym_storage_offset(Tensor self) -> SymInt
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  tags: core
+  manual_cpp_binding: True
+
+- func: slice.Tensor(Tensor(a) self, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: slice
+  tags: core
+
+# NOTE: The implementation of split_with_sizes bypasses the dispatcher to call this; undo
+# that if adding specific implementations here!
+
+- func: slice_backward(Tensor grad_output, SymInt[] input_sizes, int dim, SymInt start, SymInt end, SymInt step) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: slice_backward
+  autogen: slice_backward.out
+
+# NB: This op exists to back the implementation of reverse view_funcs for various views (chunk,
+# slice.Tensor, split_with_sizes, et al.). Currently, these are only used during fake-ification
+# of PT2 graph input subclass instances that are views. This means:
+# * This op shouldn't really show up in eager mode (so e.g. XLA shouldn't have to implement it)
+# * This op shouldn't show up in a PT2 graph (so a PT2 backend shouldn't have to implement it)
+# * A subclass will have to implement this to work in PT2 if a subclass view is used as a graph
+#   input AND the view utilizes this op in its inverse. The idea is that slice_inverse() is
+#   easier to implement for a subclass than as_strided()
+- func: slice_inverse(Tensor(a) self, Tensor src, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: slice_inverse_symint
+
+- func: slice_scatter(Tensor self, Tensor src, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1) -> Tensor
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: slice_scatter
+  autogen: slice_scatter.out
+  tags: [core, view_copy]
+
+- func: select_scatter(Tensor self, Tensor src, int dim, SymInt index) -> Tensor
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: select_scatter_symint
+  autogen: select_scatter.out
+  tags: core
+
+- func: diagonal_scatter(Tensor self, Tensor src, int offset=0, int dim1=0, int dim2=1) -> Tensor
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: diagonal_scatter
+  autogen: diagonal_scatter.out
+
+- func: as_strided_scatter(Tensor self, Tensor src, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: as_strided_scatter_symint
+  autogen: as_strided_scatter.out
+
+- func: smm(Tensor self, Tensor mat2) -> Tensor
+  variants: function, method
+
+# softmax allows positional dtype, unlike most operators, because kwonly is BC-breaking when loading jit models.
+- func: softmax.int(Tensor self, int dim, ScalarType? dtype=None) -> Tensor
+  variants: function, method
+
+- func: softmax.int_out(Tensor self, int dim, ScalarType? dtype=None, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: softmax_out
+
+- func: softmax.Dimname(Tensor self, Dimname dim, *, ScalarType? dtype=None) -> Tensor
+  variants: function, method
+
+- func: _softmax(Tensor self, int dim, bool half_to_float) -> Tensor
+  structured_delegate: _softmax.out
+  dispatch:
+    MkldnnCPU: mkldnn_softmax
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: softmax_nested
+  tags: core
+
+- func: _softmax.out(Tensor self, int dim, bool half_to_float, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: softmax_cpu_out
+    CUDA: softmax_cuda_out
+    MPS: softmax_mps_out
+
+- func: _softmax_backward_data(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype) -> Tensor
+  structured_delegate: _softmax_backward_data.out
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: nested_softmax_backward
+
+- func: _softmax_backward_data.out(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype, *, Tensor(a!) grad_input) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: softmax_backward_cpu_out
+    CUDA: softmax_backward_cuda_out
+    MPS: softmax_backward_mps_out
+
+- func: unsafe_split.Tensor(Tensor self, SymInt split_size, int dim=0) -> Tensor[]
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: unsafe_split
+  autogen: unsafe_split.Tensor_out
+
+- func: split.Tensor(Tensor(a -> *) self, SymInt split_size, int dim=0) -> Tensor(a)[]
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: split
+
+- func: split.sizes(Tensor(a -> *) self, SymInt[] split_size, int dim=0) -> Tensor(a)[]
+  variants: function, method
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: split_symint
+
+- func: unsafe_split_with_sizes(Tensor self, SymInt[] split_sizes, int dim=0) -> Tensor[]
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: unsafe_split_with_sizes
+  autogen: unsafe_split_with_sizes.out
+
+- func: split_with_sizes(Tensor(a -> *) self, SymInt[] split_sizes, int dim=0) -> Tensor(a)[]
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: split_with_sizes
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: split_with_sizes_nested
+  tags: core
+
+- func: hsplit.int(Tensor(a -> *) self, int sections) -> Tensor(a)[]
+  variants: function, method
+
+- func: hsplit.array(Tensor(a -> *) self, int[] indices) -> Tensor(a)[]
+  variants: function, method
+
+- func: vsplit.int(Tensor(a -> *) self, int sections) -> Tensor(a)[]
+  variants: function, method
+
+- func: vsplit.array(Tensor(a -> *) self, int[] indices) -> Tensor(a)[]
+  variants: function, method
+
+- func: dsplit.int(Tensor(a -> *) self, int sections) -> Tensor(a)[]
+  variants: function, method
+
+- func: dsplit.array(Tensor(a -> *) self, int[] indices) -> Tensor(a)[]
+  variants: function, method
+
+- func: squeeze(Tensor(a) self) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: squeeze
+    QuantizedCPU, QuantizedCUDA: squeeze_quantized
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: squeeze_nested
+
+- func: squeeze.dim(Tensor(a) self, int dim) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: squeeze
+    QuantizedCPU, QuantizedCUDA: squeeze_quantized
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: squeeze_dim_nested
+  tags: core
+
+- func: squeeze.dimname(Tensor(a) self, Dimname dim) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+
+
+- func: squeeze.dims(Tensor(a) self, int[] dim) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: squeeze
+    QuantizedCPU, QuantizedCUDA: squeeze_quantized
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: squeeze_dim_nested
+  tags: core
+
+- func: squeeze_(Tensor(a!) self) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  tags: inplace_view
+  dispatch:
+    CompositeExplicitAutograd: squeeze_
+
+- func: squeeze_.dim(Tensor(a!) self, int dim) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  tags: inplace_view
+  dispatch:
+    CompositeExplicitAutograd: squeeze_
+
+- func: squeeze_.dims(Tensor(a!) self, int[] dim) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  tags: inplace_view
+  dispatch:
+    CompositeExplicitAutograd: squeeze_
+
+- func: squeeze_.dimname(Tensor(a!) self, Dimname dim) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  tags: inplace_view
+
+- func: sspaddmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  variants: function, method
+
+- func: sspaddmm.out(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU: _sspaddmm_out_only_sparse
+    CUDA: _sspaddmm_out_only_sparse_cuda
+    SparseCPU: _sspaddmm_out_cpu
+    SparseCUDA: _sspaddmm_out_cuda
+
+- func: _chunk_cat(Tensor[] tensors, int dim, int num_chunks) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _chunk_cat
+    CUDA: _chunk_cat_cuda
+
+- func: _chunk_cat.out(Tensor[] tensors, int dim, int num_chunks, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: _chunk_cat_out
+    CUDA: _chunk_cat_out_cuda
+
+- func: stack(Tensor[] tensors, int dim=0) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: stack
+
+- func: stack.out(Tensor[] tensors, int dim=0, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: stack_out
+
+- func: _stack(Tensor[] tensors, int dim=0) -> Tensor
+  dispatch: # match the backends supported by _cat
+    CPU: _stack_cpu
+    CompositeExplicitAutograd: _stack
+
+- func: _stack.out(Tensor[] tensors, int dim=0, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch: # match the backends supported by _cat_out
+    CPU: _stack_out_cpu
+    CompositeExplicitAutograd: _stack_out
+
+- func: hstack(Tensor[] tensors) -> Tensor
+
+- func: hstack.out(Tensor[] tensors, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: vstack(Tensor[] tensors) -> Tensor
+
+- func: vstack.out(Tensor[] tensors, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: dstack(Tensor[] tensors) -> Tensor
+
+- func: dstack.out(Tensor[] tensors, *, Tensor(a!) out) -> Tensor(a!)
+
+# Overload without center & pad mode, needed for forward-compatibility
+- func: stft(Tensor self, int n_fft, int? hop_length=None, int? win_length=None, Tensor? window=None, bool normalized=False, bool? onesided=None, bool? return_complex=None, bool? align_to_window=None) -> Tensor
+  variants: function, method
+  cpp_no_default_args: ['hop_length', 'win_length', 'window', 'normalized']
+
+- func: stft.center(Tensor self, int n_fft, int? hop_length=None, int? win_length=None, Tensor? window=None, bool center=True, str pad_mode="reflect", bool normalized=False, bool? onesided=None, bool? return_complex=None, bool? align_to_window=None) -> Tensor
+  variants: function, method
+
+- func: istft(Tensor self, int n_fft, int? hop_length=None, int? win_length=None, Tensor? window=None, bool center=True, bool normalized=False, bool? onesided=None, int? length=None, bool return_complex=False) -> Tensor
+  variants: function, method
+
+- func: stride.int(Tensor self, int dim) -> int
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  manual_cpp_binding: True
+
+- func: stride.Dimname(Tensor self, Dimname dim) -> int
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+
+- func: sym_stride.int(Tensor self, int dim) -> SymInt
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  tags: core
+  manual_cpp_binding: True
+
+- func: sum(Tensor self, *, ScalarType? dtype=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: sum
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: sum_coo
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sum_csr
+  autogen: sum.out
+  tags: reduction
+
+- func: sum.dim_IntList(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  # TODO: Align the signature of sum.dim_IntList and _sparse_csr_sum.dim_dtype
+  structured_delegate: sum.IntList_out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    NestedTensorCPU: NestedTensor_sum_dim_CPU
+    SparseCPU, SparseCUDA, SparseMPS: sum_sparse_coo
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sum_sparse_compressed
+  tags: [core, reduction]
+
+- func: sum.dim_DimnameList(Tensor self, Dimname[1] dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: sum.IntList_out(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: sum_out
+    MPS: sum_out_mps
+  tags: reduction
+
+- func: sum.DimnameList_out(Tensor self, Dimname[1] dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: reduction
+
+# TODO: this function will be replaced once nested expand semantics have been settled on
+- func: _nested_sum_backward(Tensor grad, Tensor self, int[1]? dim, bool keepdim=False) -> Tensor
+  dispatch:
+    NestedTensorCPU: _nested_sum_backward_cpu
+
+- func: nansum(Tensor self, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU, CUDA: nansum
+    MPS: nansum_mps
+  tags: reduction
+
+- func: nansum.out(Tensor self, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: nansum_out
+    MPS: nansum_out_mps
+  tags: reduction
+
+- func: hash_tensor(Tensor self, int[1] dim=[], *, bool keepdim=False, int mode=0) -> Tensor
+  variants: function, method
+  structured_delegate: hash_tensor.out
+
+- func: hash_tensor.out(Tensor self, int[1] dim=[], *, bool keepdim=False, int mode=0, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU, CUDA: hash_tensor_out
+
+- func: sum_to_size(Tensor self, SymInt[] size) -> Tensor
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: sum_to_size_symint
+
+- func: sqrt(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: sqrt.out
+  variants: function, method
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_sqrt
+    SparseCPU, SparseCUDA, SparseMPS: sqrt_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sqrt_sparse_csr
+  tags: [core, pointwise]
+
+- func: sqrt_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: sqrt.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sqrt_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sqrt_sparse_csr_
+  tags: pointwise
+
+- func: sqrt.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: sqrt_out
+    SparseCPU, SparseCUDA, SparseMPS: sqrt_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sqrt_sparse_csr_out
+  tags: pointwise
+
+- func: square(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: pointwise
+
+- func: square_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: pointwise
+
+- func: square.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  tags: pointwise
+
+- func: std(Tensor self, bool unbiased=True) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: std.dim(Tensor self, int[1]? dim, bool unbiased=True, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: std.correction(Tensor self, int[1]? dim=None, *, Scalar? correction=None, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CPU, CUDA: std
+    MPS: std_mps
+    QuantizedCPU: std_quantized_cpu
+  tags: reduction
+
+- func: std_mean(Tensor self, bool unbiased=True) -> (Tensor, Tensor)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: std_mean.dim(Tensor self, int[1]? dim, bool unbiased=True, bool keepdim=False) -> (Tensor, Tensor)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: std_mean.correction(Tensor self, int[1]? dim=None, *, Scalar? correction=None, bool keepdim=False) -> (Tensor, Tensor)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CPU, CUDA: std_mean
+    MPS: std_mean_mps
+  autogen: std_mean.correction_out
+  tags: reduction
+
+- func: std_mean.names_dim(Tensor self, Dimname[1] dim, bool unbiased=True, bool keepdim=False) -> (Tensor, Tensor)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: std_mean.correction_names(Tensor self, Dimname[1] dim, *, Scalar? correction=None, bool keepdim=False) -> (Tensor, Tensor)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  tags: reduction
+
+- func: std.out(Tensor self, int[1]? dim, bool unbiased=True, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: std.correction_out(Tensor self, int[1]? dim=None, *, Scalar? correction=None, bool keepdim=False, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: std_out
+    QuantizedCPU: std_out_quantized_cpu
+  tags: reduction
+
+- func: std.names_dim(Tensor self, Dimname[1] dim, bool unbiased=True, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: std.names_out(Tensor self, Dimname[1] dim, bool unbiased=True, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: std.correction_names(Tensor self, Dimname[1] dim, *, Scalar? correction=None, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: std.correction_names_out(Tensor self, Dimname[1] dim, *, Scalar? correction=None, bool keepdim=False, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  tags: reduction
+
+- func: prod(Tensor self, *, ScalarType? dtype=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CPU, CUDA: prod
+    MPS: prod_mps
+  autogen: prod.out
+  tags: [core, reduction]
+
+- func: prod.dim_int(Tensor self, int dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  structured_delegate: prod.int_out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: [core, reduction]
+
+- func: prod.int_out(Tensor self, int dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: prod_out
+    MPS: prod_out_mps
+  tags: reduction
+
+- func: prod.dim_Dimname(Tensor self, Dimname dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: prod.Dimname_out(Tensor self, Dimname dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: reduction
+
+- func: t(Tensor(a) self) -> Tensor(a)
+  device_check: NoCheck
+  device_guard: False
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: t
+
+- func: t_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck
+  device_guard: False
+  variants: method
+  tags: inplace_view
+  dispatch:
+    CompositeExplicitAutograd: t_
+
+- func: tan(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: tan.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: tan_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: tan_sparse_csr
+  tags: [core, pointwise]
+
+- func: tan_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: tan.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: tan_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: tan_sparse_csr_
+  tags: pointwise
+
+- func: tan.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: tan_out
+    SparseCPU, SparseCUDA, SparseMPS: tan_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: tan_sparse_csr_out
+  tags: pointwise
+
+- func: tanh(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: tanh.out
+  variants: function, method
+  dispatch:
+    QuantizedCPU: tanh_quantized_cpu
+    MkldnnCPU: mkldnn_tanh
+    SparseCPU, SparseCUDA, SparseMPS: tanh_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: tanh_sparse_csr
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_tanh
+  tags: [core, pointwise]
+
+- func: tanh_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: tanh.out
+  variants: function, method
+  dispatch:
+    MkldnnCPU: mkldnn_tanh_
+    SparseCPU, SparseCUDA, SparseMPS: tanh_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: tanh_sparse_csr_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_tanh_
+  tags: pointwise
+
+- func: tanh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: tanh_out
+    SparseCPU, SparseCUDA, SparseMPS: tanh_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: tanh_sparse_csr_out
+  tags: pointwise
+
+- func: tensordot(Tensor self, Tensor other, int[] dims_self, int[] dims_other) -> Tensor
+  variants: function
+
+- func: tensordot.out(Tensor self, Tensor other, int[] dims_self, int[] dims_other, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+
+# TODO: namespace threshold in 'nn'
+- func: threshold(Tensor self, Scalar threshold, Scalar value) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  structured_delegate: threshold.out
+  dispatch:
+    QuantizedCPU: threshold_quantized_cpu
+  tags: pointwise
+
+- func: threshold_(Tensor(a!) self, Scalar threshold, Scalar value) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  structured_delegate: threshold.out
+
+- func: threshold.out(Tensor self, Scalar threshold, Scalar value, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: threshold_out
+    MPS: threshold_out_mps
+
+- func: threshold_backward.grad_input(Tensor grad_output, Tensor self, Scalar threshold, *, Tensor(a!) grad_input) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: threshold_backward_out
+    MPS: threshold_backward_out_mps
+    SparseCPU, SparseCUDA: threshold_backward_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: threshold_backward_sparse_compressed_out
+
+- func: threshold_backward(Tensor grad_output, Tensor self, Scalar threshold) -> Tensor
+  variants: function
+  structured_delegate: threshold_backward.grad_input
+  dispatch:
+    MkldnnCPU: mkldnn_relu_backward
+    SparseCPU, SparseCUDA: threshold_backward_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: threshold_backward_sparse_compressed
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: threshold_backwards_nested
+  tags: pointwise
+
+- func: tile(Tensor self, SymInt[] dims) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeImplicitAutograd: tile_symint
+
+- func: transpose.int(Tensor(a) self, int dim0, int dim1) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: transpose
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: transpose_nested
+
+- func: transpose.Dimname(Tensor(a) self, Dimname dim0, Dimname dim1) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+
+- func: _mkldnn_transpose(Tensor self, int dim0, int dim1) -> Tensor
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    MkldnnCPU: mkldnn_transpose
+
+- func: transpose_(Tensor(a!) self, int dim0, int dim1) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  tags: inplace_view
+  dispatch:
+    CompositeExplicitAutograd: transpose_
+
+- func: _mkldnn_transpose_(Tensor(a!) self, int dim0, int dim1) -> Tensor(a!)
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    MkldnnCPU: mkldnn_transpose_
+  autogen: _mkldnn_transpose.out
+
+- func: one_hot(Tensor self, int num_classes=-1) -> Tensor
+  python_module: nn
+  variants: function
+  tags: dynamic_output_shape
+
+- func: flip(Tensor self, int[] dims) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU, QuantizedCPU, CUDA, QuantizedCUDA: flip
+    MPS: flip_mps
+  autogen: flip.out
+  tags: core
+
+- func: fliplr(Tensor self) -> Tensor
+  variants: function, method
+
+- func: flipud(Tensor self) -> Tensor
+  variants: function, method
+
+- func: roll(Tensor self, SymInt[1] shifts, int[1] dims=[]) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU, MPS: roll
+    CUDA: roll_cuda
+  autogen: roll.out
+
+# default int[] value [0,1] should not add space after comma, since codegen parser uses ', ' to split args
+
+- func: rot90(Tensor self, int k=1, int[] dims=[0,1]) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: rot90
+  autogen: rot90.out
+
+- func: trapezoid.x(Tensor y, Tensor x, *, int dim=-1) -> Tensor
+
+- func: trapezoid.dx(Tensor y, *, Scalar dx=1, int dim=-1) -> Tensor
+
+- func: trapz.x(Tensor y, Tensor x, *, int dim=-1) -> Tensor
+
+- func: trapz.dx(Tensor y, *, float dx=1, int dim=-1) -> Tensor
+
+# Fused implementation detail for transformers. Adds in-projection bias to QKV and divides Q by sqrt(D/num_heads).
+- func: _transform_bias_rescale_qkv(Tensor qkv, Tensor qkv_bias, int num_heads) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CPU, NestedTensorCPU: transform_bias_rescale_qkv_cpu
+    CUDA, NestedTensorCUDA: transform_bias_rescale_qkv_cuda
+  autogen: _transform_bias_rescale_qkv.out
+
+- func: _nested_tensor_from_mask(Tensor t, Tensor mask, bool mask_check=True) -> Tensor
+  dispatch:
+    CPU, CUDA: NestedTensor_nested_tensor_from_mask
+  autogen: _nested_tensor_from_mask.out
+
+- func: _nested_tensor_from_mask_left_aligned(Tensor t, Tensor mask) -> bool
+  dispatch:
+    CPU, CUDA: NestedTensor_nested_tensor_from_mask_left_aligned
+
+- func: _nested_from_padded(Tensor padded, Tensor cpu_nested_shape_example, bool fuse_transform_0213=False) -> Tensor
+  device_check: NoCheck # cpu_nested_shape_example will always be on CPU
+  dispatch:
+    CPU: nested_from_padded_generic
+    CUDA: nested_from_padded_cuda
+  autogen: _nested_from_padded.out
+
+# These private functions are temporary. They will be updated/deleted when nested tensors switch to using SymInts for their metadata representation
+- func: _nested_tensor_size(Tensor self) -> Tensor
+  variants: method
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: _nested_tensor_size
+  autogen: _nested_tensor_size.out
+
+- func: _nested_tensor_strides(Tensor self) -> Tensor
+  variants: method
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: _nested_tensor_strides
+  autogen: _nested_tensor_strides.out
+
+- func: _nested_tensor_storage_offsets(Tensor self) -> Tensor
+  variants: method
+  dispatch:
+    NestedTensorCPU, NestedTensorCUDA, NestedTensorMeta: _nested_tensor_storage_offsets
+  autogen: _nested_tensor_storage_offsets.out
+
+# _nested_from_padded is not usable from Python, so
+# _nested_from_padded_and_nested_example is available for testing.
+- func: _nested_from_padded_and_nested_example(Tensor padded, Tensor nt_example) -> Tensor
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_from_padded_and_nested_example
+  autogen: _nested_from_padded_and_nested_example.out
+
+# The input arguments' types to this functions are temporary. When nested tensors switch to using SymInts for their metadata representation
+# this will need to be updated
+- func: _nested_view_from_buffer(Tensor(a) self, Tensor nested_size, Tensor nested_strides, Tensor offsets) -> Tensor(a)
+  variants: function
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA: _nested_view_from_buffer
+
+- func: _nested_view_from_buffer_copy(Tensor self, Tensor nested_size, Tensor nested_strides, Tensor offsets) -> Tensor
+  variants: function
+  device_check: NoCheck
+  tags: view_copy
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _nested_view_from_buffer_copy
+  autogen: _nested_view_from_buffer_copy.out
+
+- func: _nested_view_from_jagged(Tensor(a) self, Tensor offsets, Tensor dummy, Tensor? lengths=None, int ragged_idx=1, Tensor? min_seqlen=None, Tensor? max_seqlen=None) -> Tensor(a)
+  variants: function
+  device_check: NoCheck
+  dispatch: {}
+
+- func: _nested_view_from_jagged_copy(Tensor self, Tensor offsets, Tensor dummy, Tensor? lengths=None, int ragged_idx=1, Tensor? min_seqlen=None, Tensor? max_seqlen=None) -> Tensor
+  variants: function
+  device_check: NoCheck
+  tags: view_copy
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _nested_view_from_jagged_copy
+  autogen: _nested_view_from_jagged_copy.out
+
+- func: _nested_get_values(Tensor(a) self) -> Tensor(a)
+  variants: function
+  device_check: NoCheck
+  dispatch: {}
+
+- func: _nested_get_values_copy(Tensor self) -> Tensor
+  variants: function
+  device_check: NoCheck
+  tags: view_copy
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _nested_get_values_copy
+  autogen: _nested_get_values_copy.out
+
+- func: _nested_get_offsets(Tensor self) -> Tensor
+  variants: function
+  device_check: NoCheck
+  dispatch: {}
+
+# returns undefined Tensor if no lengths present
+- func: _nested_get_lengths(Tensor self) -> Tensor
+  variants: function
+  device_check: NoCheck
+  dispatch: {}
+
+- func: _nested_get_ragged_idx(Tensor self) -> int
+  variants: function
+  device_check: NoCheck
+  dispatch: {}
+
+- func: _nested_get_min_seqlen(Tensor self) -> Tensor
+  variants: function
+  device_check: NoCheck
+  dispatch: {}
+
+- func: _nested_get_max_seqlen(Tensor self) -> Tensor
+  variants: function
+  device_check: NoCheck
+  dispatch: {}
+
+- func: _nested_get_jagged_dummy(Tensor any) -> Tensor
+  category_override: dummy
+  dispatch: {}
+
+- func: _nested_compute_contiguous_strides_offsets(Tensor nested_size) -> (Tensor, Tensor)
+  variants: function
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA: _nested_compute_contiguous_strides_offsets
+
+- func: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor
+  dispatch:
+    # calls unsqueeze
+    CompositeExplicitAutogradNonFunctional: _trilinear
+  autogen: _trilinear.out
+
+- func: triplet_margin_loss(Tensor anchor, Tensor positive, Tensor negative, float margin=1.0, float p=2, float eps=1e-06, bool swap=False, int reduction=Mean) -> Tensor
+
+- func: trunc(Tensor self) -> Tensor
+  structured_delegate: trunc.out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: trunc_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: trunc_sparse_csr
+  tags: [core, pointwise]
+
+- func: trunc_(Tensor(a!) self) -> Tensor(a!)
+  structured_delegate: trunc.out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: trunc_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: trunc_sparse_csr_
+  tags: pointwise
+
+- func: trunc.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MPS: trunc_out
+    SparseCPU, SparseCUDA, SparseMPS: trunc_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: trunc_sparse_csr_out
+  tags: pointwise
+# Alias for trunc
+
+- func: fix(Tensor self) -> Tensor
+  variants: function, method
+
+- func: fix_(Tensor(a!) self) -> Tensor(a!)
+  variants: function, method
+
+- func: fix.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: type_as(Tensor self, Tensor other) -> Tensor
+  variants: method
+
+- func: _has_compatible_shallow_copy_type(Tensor self, Tensor from) -> bool
+  variants: function
+
+- func: _unique(Tensor self, bool sorted=True, bool return_inverse=False) -> (Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU: _unique_cpu
+    CUDA: _unique_cuda
+  autogen: _unique.out
+
+- func: unique_dim(Tensor self, int dim, bool sorted=True, bool return_inverse=False, bool return_counts=False) -> (Tensor, Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU: unique_dim_cpu
+    CUDA: unique_dim_cuda
+    MPS: unique_dim_mps
+  tags: dynamic_output_shape
+  autogen: unique_dim.out
+
+- func: unique_consecutive(Tensor self, bool return_inverse=False, bool return_counts=False, int? dim=None) -> (Tensor, Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU: unique_consecutive_cpu
+    CUDA: unique_consecutive_cuda
+    MPS: unique_consecutive_mps
+  tags: dynamic_output_shape
+  autogen: unique_consecutive.out
+
+- func: unique_dim_consecutive(Tensor self, int dim, bool return_inverse=False, bool return_counts=False) -> (Tensor, Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU: unique_dim_consecutive_cpu
+    CUDA: unique_dim_consecutive_cuda
+    MPS: unique_dim_consecutive_mps
+  tags: dynamic_output_shape
+  autogen: unique_dim_consecutive.out
+
+# _unique and _unique_dim are fragile and modifying them easily cause internal break
+# the below operator is a temporary hack for adding return_counts support
+# Please don't rely on these two operators, they will be removed soon
+
+- func: _unique2(Tensor self, bool sorted=True, bool return_inverse=False, bool return_counts=False) -> (Tensor, Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU: _unique2_cpu
+    CUDA: _unique2_cuda
+    MPS: _unique2_mps
+  tags: dynamic_output_shape
+  autogen: _unique2.out
+
+- func: _unsafe_view(Tensor self, SymInt[] size) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _unsafe_view
+  autogen: _unsafe_view.out
+
+- func: unsqueeze(Tensor(a) self, int dim) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: unsqueeze
+    SparseCPU, SparseCUDA, SparseMPS: unsqueeze_sparse
+    QuantizedCPU, QuantizedCUDA: unsqueeze_quantized
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: unsqueeze_nested
+  tags: core
+
+- func: unsqueeze_(Tensor(a!) self, int dim) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  tags: inplace_view
+  dispatch:
+    CompositeExplicitAutograd: unsqueeze_
+
+- func: vander(Tensor x, int? N=None, bool increasing=False) -> Tensor
+
+- func: var(Tensor self, bool unbiased=True) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: var.dim(Tensor self, int[1]? dim, bool unbiased=True, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: [core, reduction]
+  cpp_no_default_args: ["unbiased"]
+
+- func: var.correction(Tensor self, int[1]? dim=None, *, Scalar? correction=None, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CPU, CUDA: var
+    MPS: var_mps
+    MTIA: var_mtia
+  tags: [core, reduction]
+
+- func: var.out(Tensor self, int[1]? dim, bool unbiased=True, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: var.correction_out(Tensor self, int[1]? dim=None, *, Scalar? correction=None, bool keepdim=False, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: var_out
+  tags: reduction
+
+- func: var.names_dim(Tensor self, Dimname[1] dim, bool unbiased=True, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: var.names_out(Tensor self, Dimname[1] dim, bool unbiased=True, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: var.correction_names(Tensor self, Dimname[1] dim, *, Scalar? correction=None, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: var.correction_names_out(Tensor self, Dimname[1] dim, *, Scalar? correction=None, bool keepdim=False, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  tags: reduction
+
+- func: var_mean(Tensor self, bool unbiased=True) -> (Tensor, Tensor)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: var_mean.dim(Tensor self, int[1]? dim, bool unbiased=True, bool keepdim=False) -> (Tensor, Tensor)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: var_mean.correction(Tensor self, int[1]? dim=None, *, Scalar? correction=None, bool keepdim=False) -> (Tensor, Tensor)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CPU, CUDA: var_mean
+    MPS: var_mean_mps
+  autogen: var_mean.correction_out
+  tags: reduction
+
+- func: var_mean.names_dim(Tensor self, Dimname[1] dim, bool unbiased=True, bool keepdim=False) -> (Tensor, Tensor)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  cpp_no_default_args: ["unbiased"]
+  tags: reduction
+
+- func: var_mean.correction_names(Tensor self, Dimname[1] dim, *, Scalar? correction=None, bool keepdim=False) -> (Tensor, Tensor)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  tags: reduction
+
+- func: view_as(Tensor(a) self, Tensor other) -> Tensor(a)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+
+- func: where.self(Tensor condition, Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CPU, CUDA, MPS, MTIA: where
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_where
+  tags: [core, pointwise]
+
+- func: where.self_out(Tensor condition, Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MPS, MTIA: where_self_out
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_where_out
+
+- func: where.ScalarSelf(Tensor condition, Scalar self, Tensor other) -> Tensor
+  variants: function
+
+- func: where.ScalarOther(Tensor condition, Tensor self, Scalar other) -> Tensor
+  variants: function, method
+
+- func: where.Scalar(Tensor condition, Scalar self, Scalar other) -> Tensor
+  variants: function
+
+- func: where(Tensor condition) -> Tensor[]
+  device_check: NoCheck   # TensorIterator
+  variants: function
+
+- func: norm_except_dim(Tensor v, int pow=2, int dim=0) -> Tensor
+  variants: function
+
+# VariableType::_weight_norm does not want to be given a gap in the autograd graph,
+# so we don't define "dispatch" variants for it.
+- func: _weight_norm(Tensor v, Tensor g, int dim=0) -> Tensor
+  variants: function
+
+- func: _weight_norm_interface(Tensor v, Tensor g, int dim=0) -> (Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU: weight_norm_cpu
+    CUDA: weight_norm_cuda
+    MPS: weight_norm_mps
+  autogen: _weight_norm_interface.out
+
+- func: _weight_norm_interface_backward(Tensor grad_w, Tensor saved_v, Tensor saved_g, Tensor saved_norms, int dim) -> (Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU: weight_norm_backward_cpu
+    CUDA: weight_norm_backward_cuda
+    MPS: weight_norm_backward_mps
+  autogen: _weight_norm_interface_backward.out
+
+- func: _weight_norm_differentiable_backward(Tensor grad_w, Tensor saved_v, Tensor saved_g, Tensor saved_norms, int dim) -> (Tensor, Tensor)
+  variants: function
+
+- func: zeros.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: zeros
+  autogen: zeros.names_out
+
+- func: _efficientzerotensor(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CPU: _efficientzerotensor
+    CUDA: _efficientzerotensor_cuda
+    MPS: _efficientzerotensor_mps
+    Meta: _efficientzerotensor_meta_symint
+  autogen: _efficientzerotensor.out
+
+- func: zeros(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: zeros_symint
+
+- func: zeros.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: zeros_out
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: zeros_sparse_out
+
+- func: zeros_like(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  dispatch:
+    # NB: Although this composite mutates on the inside, it is
+    # non-differentiable so NonFunctional doesn't apply
+    CompositeExplicitAutograd, CompositeImplicitAutogradNestedTensor: zeros_like
+  autogen: zeros_like.out
+
+- func: _standard_gamma_grad(Tensor self, Tensor output) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _standard_gamma_grad_cpu
+    CUDA: _standard_gamma_grad_cuda
+  autogen: _standard_gamma_grad.out
+
+- func: _standard_gamma(Tensor self, Generator? generator=None) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _s_gamma_cpu
+    CUDA: _s_gamma_cuda
+  tags: nondeterministic_seeded
+  autogen: _standard_gamma.out
+
+- func: _dirichlet_grad(Tensor x, Tensor alpha, Tensor total) -> Tensor
+  dispatch:
+    CPU: _dirichlet_grad_cpu
+    CUDA: _dirichlet_grad_cuda
+  autogen: _dirichlet_grad.out
+
+- func: _sample_dirichlet(Tensor self, Generator? generator=None) -> Tensor
+  tags: nondeterministic_seeded
+  variants: function
+  dispatch:
+    CPU: _s_dirichlet_cpu
+    CUDA: _s_dirichlet_cuda
+  autogen: _sample_dirichlet.out
+
+- func: poisson(Tensor self, Generator? generator=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU: _s_poisson_cpu
+    CUDA: _s_poisson_cuda
+  tags: nondeterministic_seeded
+  autogen: poisson.out
+
+- func: binomial(Tensor count, Tensor prob, Generator? generator=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU: _s_binomial_cpu
+    CUDA: _s_binomial_cuda
+  tags: nondeterministic_seeded
+  autogen: binomial.out
+
+# When more variants get ported to native, this dispatch will get more
+# complicated
+
+- func: native_norm(Tensor self, Scalar p=2) -> Tensor
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: norm_sparse
+  autogen: native_norm.out
+
+- func: native_norm.ScalarOpt_dim_dtype(Tensor self, Scalar? p, int[1] dim, bool keepdim, ScalarType? dtype) -> Tensor
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: norm_sparse
+  autogen: native_norm.ScalarOpt_dim_dtype_out
+
+- func: _batch_norm_with_update(Tensor input, Tensor? weight, Tensor? bias, Tensor(a!) running_mean, Tensor(b!) running_var, float momentum, float eps) -> (Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: _batch_norm_with_update_cpu
+    CUDA: _batch_norm_with_update_cuda
+    MPS: _batch_norm_with_update_mps
+    MkldnnCPU: _batch_norm_with_update_mkldnn
+  autogen: _batch_norm_with_update_functional
+
+- func: _batch_norm_with_update.out(Tensor input, Tensor? weight, Tensor? bias, Tensor(a!) running_mean, Tensor(b!) running_var, float momentum, float eps, *, Tensor(d!) out, Tensor(e!) save_mean, Tensor(f!) save_invstd, Tensor(g!) reserve) -> (Tensor(d!), Tensor(e!), Tensor(f!), Tensor(g!))
+  dispatch:
+    CPU: _batch_norm_with_update_cpu_out
+    CUDA: _batch_norm_with_update_cuda_out
+    MPS: _batch_norm_with_update_mps_out
+
+- func: _batch_norm_no_update(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, float momentum, float eps) -> (Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CompositeExplicitAutograd: _batch_norm_no_update
+  autogen: _batch_norm_no_update.out
+
+- func: batch_norm_backward(Tensor grad_out, Tensor input, Tensor weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_var, bool update, float eps, bool[3] output_mask, Tensor reserve) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CPU: _new_batch_norm_backward_cpu
+    CUDA: _new_batch_norm_backward_cuda
+    MPS: _new_batch_norm_backward_mps
+    MkldnnCPU: _new_batch_norm_backward_mkldnn
+
+# TODO: reduce signatures down to one when optional args is available
+- func: _sparse_sum(Tensor self) -> Tensor
+
+- func: _sparse_sum.dtype(Tensor self, *, ScalarType dtype) -> Tensor
+
+- func: _sparse_sum.dim(Tensor self, int[1] dim) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _sparse_sum
+  autogen: _sparse_sum.dim_out
+
+- func: _sparse_sum.dim_dtype(Tensor self, int[1] dim, *, ScalarType dtype) -> Tensor
+
+- func: _sparse_sum_backward(Tensor grad, Tensor self, int[] dim) -> Tensor
+  dispatch:
+    SparseCPU: _sparse_sum_backward_cpu
+    SparseCUDA: _sparse_sum_backward_cuda
+    SparseMPS: _sparse_sum_backward_mps
+  autogen: _sparse_sum_backward.out
+
+- func: _sparse_csr_sum.dim_dtype(Tensor self, int[1] dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  dispatch:
+    SparseCsrCPU: _sparse_csr_sum_cpu
+    SparseCsrCUDA: _sparse_csr_sum_cuda
+  autogen: _sparse_csr_sum.dim_dtype_out
+
+- func: _sparse_csr_prod.dim_dtype(Tensor self, int[1] dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  dispatch:
+    SparseCsrCPU: _sparse_csr_prod_cpu
+    SparseCsrCUDA: _sparse_csr_prod_cuda
+  autogen: _sparse_csr_prod.dim_dtype_out
+
+- func: _sparse_softmax.int(Tensor self, int dim, ScalarType? dtype=None) -> Tensor
+  python_module: sparse
+  variants: function
+
+- func: _sparse_softmax.Dimname(Tensor self, Dimname dim, *, ScalarType? dtype=None) -> Tensor
+  python_module: sparse
+  variants: function
+
+- func: _sparse_softmax(Tensor self, int dim, bool half_to_float) -> Tensor
+  python_module: sparse
+  dispatch:
+    SparseCPU: softmax_sparse_cpu
+    SparseCUDA: softmax_sparse_cuda
+    SparseMPS: softmax_sparse_mps
+  autogen: _sparse_softmax.out
+
+- func: _sparse_softmax_backward_data(Tensor grad_output, Tensor output, int dim, Tensor self) -> Tensor
+  dispatch:
+    SparseCPU: softmax_backward_sparse_cpu
+    SparseCUDA: softmax_backward_sparse_cuda
+    SparseMPS: softmax_backward_sparse_mps
+  autogen: _sparse_softmax_backward_data.out
+
+- func: _sparse_log_softmax.int(Tensor self, int dim, ScalarType? dtype=None) -> Tensor
+  python_module: sparse
+  variants: function
+
+- func: _sparse_log_softmax.Dimname(Tensor self, Dimname dim, *, ScalarType? dtype=None) -> Tensor
+  python_module: sparse
+  variants: function
+
+- func: _sparse_log_softmax(Tensor self, int dim, bool half_to_float) -> Tensor
+  python_module: sparse
+  dispatch:
+    SparseCPU: log_softmax_sparse_cpu
+    SparseCUDA: log_softmax_sparse_cuda
+    SparseMPS: log_softmax_sparse_mps
+  autogen: _sparse_log_softmax.out
+
+- func: _sparse_log_softmax_backward_data(Tensor grad_output, Tensor output, int dim, Tensor self) -> Tensor
+  dispatch:
+    SparseCPU: log_softmax_backward_sparse_cpu
+    SparseCUDA: log_softmax_backward_sparse_cuda
+    SparseMPS: log_softmax_backward_sparse_mps
+  autogen: _sparse_log_softmax_backward_data.out
+
+- func: _spdiags(Tensor diagonals, Tensor offsets, int[] shape, Layout? layout=None) -> Tensor
+  python_module: sparse
+  dispatch:
+    CPU: spdiags
+  autogen: _spdiags.out
+
+- func: norm.ScalarOpt_dtype(Tensor self, Scalar? p, *, ScalarType dtype) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: norm
+  autogen: norm.ScalarOpt_dtype_out
+  tags: reduction
+
+- func: norm.Scalar(Tensor self, Scalar p=2) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: norm
+  autogen: norm.Scalar_out
+  tags: reduction
+
+- func: norm.ScalarOpt_dim_dtype(Tensor self, Scalar? p, int[1] dim, bool keepdim, *, ScalarType dtype) -> Tensor
+  structured_delegate: norm.dtype_out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sparse_dtype_norm
+  tags: reduction
+
+- func: norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> Tensor
+  structured_delegate: norm.out
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sparse_norm
+  tags: reduction
+
+- func: norm.dtype_out(Tensor self, Scalar? p, int[1] dim, bool keepdim, *, ScalarType dtype, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: norm_dtype_out
+    MPS: norm_dtype_out_mps
+  tags: reduction
+
+- func: norm.out(Tensor self, Scalar? p, int[1] dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: norm_out
+    MPS: norm_out_mps
+  tags: reduction
+
+# These four redispatch in their implementation, so OK to be CompositeImplicitAutograd
+- func: norm.names_ScalarOpt_dim_dtype(Tensor self, Scalar? p, Dimname[1] dim, bool keepdim, *, ScalarType dtype) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: norm.names_ScalarOpt_dim(Tensor self, Scalar? p, Dimname[1] dim, bool keepdim=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  tags: reduction
+
+- func: norm.names_dtype_out(Tensor self, Scalar? p, Dimname[1] dim, bool keepdim, *, ScalarType dtype, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: reduction
+
+- func: norm.names_out(Tensor self, Scalar? p, Dimname[1] dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: reduction
+
+- func: frexp.Tensor(Tensor self) -> (Tensor mantissa, Tensor exponent)
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: frexp
+  tags: pointwise
+
+- func: frexp.Tensor_out(Tensor self, *, Tensor(a!) mantissa, Tensor(b!) exponent) -> (Tensor(a!) mantissa, Tensor(b!) exponent)
+  dispatch:
+    CPU, CUDA: frexp_out
+  tags: pointwise
+
+# Deprecated (v.1.12)
+- func: frobenius_norm.dim(Tensor self, int[1] dim, bool keepdim=False) -> Tensor
+  variants: function
+
+# Deprecated (v.1.12)
+- func: frobenius_norm.out(Tensor self, int[1] dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+
+# Deprecated (v.1.12)
+- func: nuclear_norm(Tensor self, bool keepdim=False) -> Tensor
+  variants: function
+
+# Deprecated (v.1.12)
+- func: nuclear_norm.out(Tensor self, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+
+# Deprecated (v.1.12)
+- func: nuclear_norm.dim(Tensor self, int[2] dim, bool keepdim=False) -> Tensor
+  variants: function
+
+# Deprecated (v.1.12)
+- func: nuclear_norm.dim_out(Tensor self, int[2] dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+
+- func: clone(Tensor self, *, MemoryFormat? memory_format=None) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: clone
+    SparseCPU, SparseCUDA, SparseMPS: clone_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: clone_sparse_compressed
+    MkldnnCPU: mkldnn_clone
+    QuantizedCPU, QuantizedCUDA: quantized_clone
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: clone_nested
+  autogen: clone.out
+  tags: [core, pointwise]
+
+- func: positive(Tensor(a) self) -> Tensor(a)
+  variants: function, method
+  tags: pointwise
+
+- func: resize_as_(Tensor(a!) self, Tensor the_template, *, MemoryFormat? memory_format=None) -> Tensor(a!)
+  use_const_ref_for_mutable_tensors: True
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: resize_as_
+  autogen: resize_as, resize_as.out
+  tags: inplace_view
+
+- func: resize_as_sparse_(Tensor(a!) self, Tensor the_template) -> Tensor(a!)
+  use_const_ref_for_mutable_tensors: True
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA: resize_as_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: resize_as_sparse_compressed_
+  autogen: resize_as_sparse, resize_as_sparse.out
+
+- func: zero_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CPU, CUDA: zero_
+    MPS: zero_mps_
+    Meta: zero_meta_
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: zero_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: zero_sparse_csr_
+    MkldnnCPU: mkldnn_zero_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: zero_nested_
+  autogen: zero, zero.out
+
+- func: sub.out(Tensor self, Tensor other, *, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: sub_out
+    MPS: sub_out_mps
+    MTIA: sub_out_mtia
+    SparseCPU, SparseCUDA, SparseMPS: sub_out_sparse
+  tags: pointwise
+
+- func: sub.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: sub.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sub_sparse
+    ZeroTensor: sub_zerotensor
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_sub_Tensor
+  tags: [core, pointwise]
+
+- func: sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: sub.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sub_sparse_
+  tags: pointwise
+# For C++ only, until we have conversion from C++ numbers to Tensor
+
+- func: sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: sub
+  tags: [core, pointwise]
+
+- func: sub_.Scalar(Tensor(a!) self, Scalar other, Scalar alpha=1) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: sub_
+  autogen: sub.Scalar_out
+  tags: pointwise
+# subtract, alias for sub
+
+- func: subtract.out(Tensor self, Tensor other, *, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+
+- func: subtract.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
+  variants: function, method
+
+- func: subtract_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> Tensor(a!)
+  variants: method
+
+# For C++ only, until we have conversion from C++ numbers to Tensor
+- func: subtract.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor
+  variants: function, method
+
+- func: subtract_.Scalar(Tensor(a!) self, Scalar other, Scalar alpha=1) -> Tensor(a!)
+  variants: method
+
+- func: rsub.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS, MTIA: rsub
+  autogen: rsub.Tensor_out
+
+- func: heaviside.out(Tensor self, Tensor values, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: heaviside_out
+  tags: pointwise
+
+- func: heaviside(Tensor self, Tensor values) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: heaviside.out
+  tags: pointwise
+
+- func: heaviside_(Tensor(a!) self, Tensor values) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: heaviside.out
+
+# For C++ only, until we have conversion from C++ numbers to Tensor
+- func: rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: rsub
+  autogen: rsub.Scalar_out
+
+# Functionally the same as addmm, but we give it a different derivative formula
+# that doesn't propagate gradients to non-present entries on sparse.
+  tags: pointwise
+- func: _sparse_addmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  python_module: sparse
+  dispatch:
+    CompositeExplicitAutograd: _sparse_addmm
+  autogen: _sparse_addmm.out
+
+- func: sparse_sampled_addmm.out(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  python_module: sparse
+  dispatch:
+    SparseCsrCUDA: sparse_sampled_addmm_out_sparse_csr_cuda
+    SparseCsrCPU: sparse_sampled_addmm_out_sparse_csr_cpu
+
+- func: sparse_sampled_addmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  python_module: sparse
+  dispatch:
+    SparseCsrCUDA: sparse_sampled_addmm_sparse_csr_cuda
+    SparseCsrCPU: sparse_sampled_addmm_sparse_csr_cpu
+
+- func: _sparse_mm_reduce_impl(Tensor self, Tensor other, str reduce) -> (Tensor, Tensor)
+  python_module: sparse
+  dispatch:
+    SparseCsrCPU: _sparse_mm_reduce_impl_sparse_csr_cpu
+
+- func: _sparse_mm_reduce_impl_backward(Tensor self, Tensor grad_out, Tensor weight, str reduce, Tensor arg_out, bool[2] output_mask) -> (Tensor, Tensor)
+  python_module: sparse
+  dispatch:
+    SparseCsrCPU: _sparse_mm_reduce_impl_backward_sparse_csr_cpu
+
+- func: addmm.out(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: addmm_out_cpu
+    CUDA: addmm_out_cuda
+    MPS: addmm_out_mps
+    XPU: addmm_out_xpu
+    MTIA: addmm_out_mtia
+    SparseCPU: addmm_out_sparse_dense_cpu
+    SparseCUDA: addmm_out_sparse_dense_cuda
+    SparseMPS: addmm_out_sparse_dense_mps
+    SparseCsrCPU: addmm_out_sparse_compressed_cpu
+    SparseCsrCUDA: addmm_out_sparse_compressed_cuda
+
+- func: addmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  structured_delegate: addmm.out
+  variants: function, method
+  dispatch:
+    SparseCPU: addmm_sparse_dense_cpu
+    SparseCUDA: addmm_sparse_dense_cuda
+    SparseMPS: addmm_sparse_dense_mps
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: addmm_sparse_compressed_dense
+  tags: core
+
+- func: addmm.dtype(Tensor self, Tensor mat1, Tensor mat2, ScalarType out_dtype, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  dispatch:
+    CUDA: _addmm_dtype_cuda
+
+- func: addmm.dtype_out(Tensor self, Tensor mat1, Tensor mat2, ScalarType out_dtype, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CUDA: _addmm_dtype_out_cuda
+
+- func: addmm_(Tensor(a!) self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor(a!)
+  structured_delegate: addmm.out
+  variants: method
+  dispatch:
+    # Warning!  For whatever reason, the inplace sparse addmm is NON
+    # broadcasting
+    SparseCPU: s_addmm_sparse_dense_cpu_
+    SparseCUDA: s_addmm_sparse_dense_cuda_
+
+- func: _addmm_activation.out(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1, bool use_gelu=False, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: addmm_activation_out_cpu
+    CUDA: addmm_activation_out_cuda
+    XPU: addmm_activation_out_xpu
+
+- func: _addmm_activation(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1, bool use_gelu=False) -> Tensor
+  structured_delegate: _addmm_activation.out
+  variants: function, method
+
+- func: _scaled_mm(Tensor self, Tensor mat2, Tensor scale_a, Tensor scale_b, Tensor? bias=None, Tensor? scale_result=None, ScalarType? out_dtype=None, bool use_fast_accum=False) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _scaled_mm_cpu
+    CUDA: _scaled_mm_cuda
+    XPU: _scaled_mm_xpu
+  tags: needs_exact_strides
+
+
+- func: _scaled_mm.out(Tensor self, Tensor mat2, Tensor scale_a, Tensor scale_b, Tensor? bias=None, Tensor? scale_result=None, ScalarType? out_dtype=None, bool use_fast_accum=False, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CPU: _scaled_mm_out_cpu
+    CUDA: _scaled_mm_out_cuda
+    XPU: _scaled_mm_out_xpu
+  tags: needs_exact_strides
+
+- func: _scaled_mm_v2(Tensor self, Tensor mat2, Tensor[] scale_a, int[] recipe_a, int[] swizzle_a, Tensor[] scale_b, int[] recipe_b, int[] swizzle_b, Tensor? bias, ScalarType? out_dtype, int[] contraction_dim=[], bool use_fast_accum=False) -> Tensor
+  variants: function
+  dispatch:
+    CUDA: _scaled_mm_cuda_v2
+    XPU: _scaled_mm_xpu_v2
+
+- func: _scaled_mm_v2.out(Tensor self, Tensor mat2, Tensor[] scale_a, int[] recipe_a, int[] swizzle_a, Tensor[] scale_b, int[] recipe_b, int[] swizzle_b, Tensor? bias, ScalarType? out_dtype, int[] contraction_dim=[], bool use_fast_accum=False, *, Tensor(a!) out) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CUDA: _scaled_mm_cuda_v2_out
+    XPU: _scaled_mm_xpu_v2_out
+
+
+- func: _scaled_grouped_mm(Tensor self, Tensor mat2, Tensor scale_a, Tensor scale_b, Tensor? offs=None, Tensor? bias=None, Tensor? scale_result=None, ScalarType? out_dtype=None, bool use_fast_accum=False) -> Tensor
+  variants: function
+  dispatch:
+    CUDA: _scaled_grouped_mm_cuda
+  tags: needs_exact_strides
+
+- func: _scaled_grouped_mm_v2(Tensor self, Tensor mat2, Tensor[] scale_a, int[] recipe_a, int[] swizzle_a, Tensor[] scale_b, int[] recipe_b, int[] swizzle_b, Tensor? offs=None, Tensor? bias=None, ScalarType? out_dtype=None, int[] contraction_dim=[], bool use_fast_accum=False) -> Tensor
+  variants: function
+  dispatch:
+    CUDA: _scaled_grouped_mm_cuda_v2
+  tags: needs_exact_strides
+
+- func: _grouped_mm(Tensor self, Tensor mat2, Tensor? offs=None, Tensor? bias=None, ScalarType? out_dtype=None) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _grouped_mm
+    CUDA: _grouped_mm_cuda
+
+# NOTE [ Sparse: autograd and API ]
+#
+#
+# Sparse Tensor Constructors
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~
+#
+# The API entry points to sparse tensor construction should be
+# `sparse_coo tensor` and `_sparse_coo_tensor_unsafe`. Depending on whether the
+# indices and values tensors are given, they eventually dispatch to either
+# `sparse_coo_tensor_with_dims` or `sparse_coo_tensor_with_dims_and_tensors`.
+#
+# The autograd support for ctor is implement on `sparse_coo_tensor_with_dims_and_tensors`.
+#
+# The API methods `sparse_coo tensor` and `_sparse_coo_tensor_unsafe`
+# **must not** have specific type dispatches because otherwise codegen will
+# consider them as abstract methods (see Note [Abstract ATen methods]), dispatch
+# using **Tensor** type, and thus lose autograd tracking on the actual method
+# they dispatch to, e.g., `sparse_coo_tensor_with_dims_and_tensors`.
+#
+#
+# Sparse Methods API Design
+# ~~~~~~~~~~~~~~~~~~~~~~~~~
+#
+# Goals: 1. Flexible API for users to write custom sparse ops
+#        2. ctor and member accessor with autograd support
+#
+# To achieve 1, we need to provide a set of *dangerous* APIs (dangerous in the
+# sense that misusing them will break sparse tensor invariant and may out in
+# unexpected behavior, e.g., crash). These methods are all prefixed with
+# underscore "_" to indicate that they should be used with care. We provide:
+#
+#   + `_indices()`: returns the *raw* indices within the sparse tensor (not just
+#                   sharing storage). Any inplace operation will change the
+#                   actual indices, including t_, set_, as_strided_, resize_,
+#                   etc.
+#   + `_values()`: returns the *raw* values within the sparse tensor. Similar
+#                  semantics as `_indices()`
+#   + `_nnz()`: returns the number of non-zero entries. This will always be
+#               determined by the shapes of indices and values.
+#   + `_coalesced_(bool)`: inplace sets whether the tensor is coalesced, and
+#                          returns itself.
+#
+# These methods are very useful in writing new operations, e.g., a custom
+# autograd Function.
+#
+# We also provide other public *safe* APIs:
+#   + `indices()`: returns a **view** of the indices tensor if the sparse tensor
+#                  is **coalesced**.
+#   + `values()`: returns a **view** of the values tensor if the containing
+#                 sparse tensor is **coalesced**.
+#   + `sparse_dim()`: number of sparse dimensions
+#   + `dense_dim()`: number of dense dimensions
+#   + `is_coalesced()`: whether the sparse tensor is coalesced
+#
+# `_indices()` and `_values()` should returns the raw indices and values dense
+# tensors within a sparse tensor. They can be quite unsafe with inplace
+# operations like `t_()`, and exposes uncoalesced indices and values. The public
+# recommended API is `indices()` and `values()`, both of which first check that
+# the tensor is coalesced and return views on those tensors.
+#
+#
+# Autograd Support
+# ~~~~~~~~~~~~~~~~
+#
+# Autograd is supported on `values()` and sparse tensor ctor with indices and
+# values tensors. E.g., `torch.sparse_coo_tensor(i, v).values().sum()` is
+# differentiable w.r.t. `v`.
+#
+# NB: The `values()` and `_values()` operators are special in that they are
+# layout-aware, i.e., the output depends not just on the data it represents, but
+# also on the input layout details (in this case, the `indices` tensor). See
+# NOTE [ as_strided Backward and layout-aware/agnostic autograd ] in Functions.cpp
+# for discussion on layout-aware vs layout-agnostic autograd. Since PyTorch ops
+# operate in the layout-agnostic mode, similar to `as_strided`, backward of
+# these two operators need to consider them in a layout-agnostic way:
+#   + `values()`:
+#     Input is coalesced.
+#     We just pretend having `input.indices()` as an additional argument
+#     `input_indices`, then forward is similar to
+#     `input.to(kStrided).index_select(input_indices)` regardless of the layout.
+#     Note that `values()` normally is layout-aware even if we constrain
+#     ourselves on sparse inputs since it may include all zeros values entries
+#     as "present" entries.
+#   + `_values()`:
+#     Input may be uncoalesced.
+#     It is not straightforward to construct a layout-agnostic version because
+#     duplicate indices entries may exist and additional parameterization is
+#     needed to distribute the value into different values entries. Furthermore,
+#     this op is intended to provide ways to write custom sparse ops, rather
+#     than being used in autograd graph, so it is marked as *non-differentiable*
+#     in derivatives.yaml.
+#
+# Before reading the following, see NOTE [ Autograd Variable Views ] in
+# variable.h for details on views that are tracked by autograd, and views that
+# are not.
+#
+# Moreover, these methods return tensors that share storage with inputs, so we
+# mark these methods as view ops to support autograd history tracking.
+# The sparse tensor ctor output should technically be view of both input indices
+# and values tensors, but currently we only support setting as view of a single
+# Variable, so it is only view of the values tensor.
+# TODO: clone indices in sparse tensor ctor.
+#
+# For other methods that return outputs that share storage with inputs, i.e.,
+# `indices()` and `_indices()`. We mark their outputs as non-differentiable, so
+# the view relation is not tracked by autograd, but the version counter is still
+# shared. In other words, their outputs are non-differentiable views of the
+# sparse tensor.
+# FIXME: would be nicer if TensorOptions was optional based; not adding default arguments for options given
+# the default would never make sense.
+
+- func: _sparse_compressed_tensor_with_dims(int nnz, int dense_dim, int[] size, int[] blocksize, ScalarType index_dtype, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: sparse_compressed_tensor_with_dims
+
+- func: sparse_compressed_tensor.comp_plain_value_size(Tensor compressed_indices, Tensor plain_indices, Tensor values, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: sparse_compressed_tensor
+
+- func: sparse_csr_tensor.crow_col_value_size(Tensor crow_indices, Tensor col_indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+- func: sparse_csc_tensor.ccol_row_value_size(Tensor ccol_indices, Tensor row_indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+- func: sparse_bsr_tensor.crow_col_value_size(Tensor crow_indices, Tensor col_indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+- func: sparse_bsc_tensor.ccol_row_value_size(Tensor ccol_indices, Tensor row_indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+
+- func: sparse_compressed_tensor.comp_plain_value(Tensor compressed_indices, Tensor plain_indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: sparse_compressed_tensor
+- func: sparse_csr_tensor.crow_col_value(Tensor crow_indices, Tensor col_indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+- func: sparse_csc_tensor.ccol_row_value(Tensor ccol_indices, Tensor row_indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+- func: sparse_bsr_tensor.crow_col_value(Tensor crow_indices, Tensor col_indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+- func: sparse_bsc_tensor.ccol_row_value(Tensor ccol_indices, Tensor row_indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+
+- func: _sparse_compressed_tensor_unsafe(Tensor compressed_indices, Tensor plain_indices, Tensor values, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: _sparse_compressed_tensor_unsafe_symint
+
+- func: _sparse_csr_tensor_unsafe(Tensor crow_indices, Tensor col_indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+- func: _sparse_csc_tensor_unsafe(Tensor ccol_indices, Tensor row_indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+- func: _sparse_bsr_tensor_unsafe(Tensor crow_indices, Tensor col_indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+- func: _sparse_bsc_tensor_unsafe(Tensor ccol_indices, Tensor row_indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+
+- func: sparse_coo_tensor.size(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: sparse_coo_tensor
+  autogen: sparse_coo_tensor.size_out
+
+- func: sparse_coo_tensor.indices(Tensor indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool? is_coalesced=None) -> Tensor
+
+- func: sparse_coo_tensor.indices_size(Tensor indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool? is_coalesced=None) -> Tensor
+
+- func: _sparse_coo_tensor_unsafe(Tensor indices, Tensor values, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool? is_coalesced=None) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: _sparse_coo_tensor_unsafe_symint
+
+- func: _validate_sparse_coo_tensor_args(Tensor indices, Tensor values, int[] size, bool? is_coalesced=None, bool? check_pinning=None) -> ()
+
+- func: _validate_sparse_compressed_tensor_args(Tensor compressed_indices, Tensor plain_indices, Tensor values, int[] size, Layout layout, bool? check_pinning=None) -> ()
+- func: _validate_sparse_csr_tensor_args(Tensor crow_indices, Tensor col_indices, Tensor values, int[] size, bool? check_pinning=None) -> ()
+- func: _validate_sparse_csc_tensor_args(Tensor ccol_indices, Tensor row_indices, Tensor values, int[] size, bool? check_pinning=None) -> ()
+- func: _validate_sparse_bsr_tensor_args(Tensor crow_indices, Tensor col_indices, Tensor values, int[] size, bool? check_pinning=None) -> ()
+- func: _validate_sparse_bsc_tensor_args(Tensor ccol_indices, Tensor row_indices, Tensor values, int[] size, bool? check_pinning=None) -> ()
+
+- func: _sparse_coo_tensor_with_dims(int sparse_dim, int dense_dim, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMeta, SparseMPS, Meta: new_with_dims_sparse
+  autogen: _sparse_coo_tensor_with_dims.out
+
+- func: _sparse_coo_tensor_with_dims_and_tensors(int sparse_dim, int dense_dim, SymInt[] size, Tensor indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? is_coalesced=None) -> Tensor
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMeta, SparseMPS, Meta: new_with_dims_and_tensor_sparse_symint
+  autogen: _sparse_coo_tensor_with_dims_and_tensors.out
+
+- func: sparse_resize_(Tensor(a!) self, int[] size, int sparse_dim, int dense_dim) -> Tensor(a!)
+  use_const_ref_for_mutable_tensors: True
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: sparse_resize_
+  autogen: sparse_resize, sparse_resize.out
+
+- func: sparse_resize_and_clear_(Tensor(a!) self, int[] size, int sparse_dim, int dense_dim) -> Tensor(a!)
+  use_const_ref_for_mutable_tensors: True
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: sparse_resize_and_clear_
+  autogen: sparse_resize_and_clear, sparse_resize_and_clear.out
+
+- func: sparse_mask(Tensor self, Tensor mask) -> Tensor
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sparse_mask
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sparse_mask_sparse_compressed
+  autogen: sparse_mask.out
+
+- func: _sparse_mask_projection(Tensor self, Tensor mask, bool accumulate_matches=False) -> Tensor
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sparse_mask_projection
+  autogen: _sparse_mask_projection.out
+
+- func: _to_cpu(Tensor[] tensors) -> Tensor[]
+  variants: function
+
+- func: to_dense(Tensor self, ScalarType? dtype=None, *, bool? masked_grad=None) -> Tensor
+  variants: method
+
+# Special case of to_dense with custom derivative
+- func: _to_dense(Tensor self, ScalarType? dtype=None, bool? masked_grad=None) -> Tensor
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sparse_to_dense
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: sparse_compressed_to_dense
+    MkldnnCPU: mkldnn_to_dense
+  autogen: _to_dense.out
+
+- func: to_dense_backward(Tensor grad, Tensor input, bool? masked_grad=None) -> Tensor
+
+- func: sparse_dim(Tensor self) -> int
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: sparse_dim_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: sparse_dim_sparse_csr
+    CompositeExplicitAutograd: sparse_dim_default
+  device_check: NoCheck
+  device_guard: False
+
+# legacy method
+- func: _dimI(Tensor self) -> int
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA: sparse_dim_sparse
+  device_check: NoCheck
+  device_guard: False
+
+- func: dense_dim(Tensor self) -> int
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: dense_dim_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: dense_dim_sparse_csr
+    CompositeExplicitAutograd: dense_dim_default
+  device_check: NoCheck
+  device_guard: False
+
+# legacy method
+- func: _dimV(Tensor self) -> int
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMeta: dense_dim_sparse
+  device_check: NoCheck
+  device_guard: False
+
+- func: _nnz(Tensor self) -> int
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: _nnz_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMPS, SparseCsrMeta: _nnz_sparse_csr
+  device_check: NoCheck
+  device_guard: False
+
+# NOTE: [ coalesce autograd ]
+# coalesce returns self directly for already coalesced sparse tensors.
+# This means coalesce cannot have a derivative registered, otherwise it creates
+# circular references in the autograd graph (see gh-52874).
+# Instead, the derivative is registered on the slow-path "_coalesce"
+- func: coalesce(Tensor(a) self) -> Tensor(a)
+  variants: method
+
+- func: _coalesce(Tensor self) -> Tensor
+  dispatch:
+    SparseCPU: _coalesce_sparse_cpu
+    SparseCUDA: _coalesce_sparse_cuda
+    SparseMPS: _coalesce_sparse_mps
+  autogen: _coalesce.out
+
+- func: is_coalesced(Tensor self) -> bool
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: is_coalesced_sparse
+    CompositeExplicitAutograd: is_coalesced_default
+  device_check: NoCheck
+  device_guard: False
+
+- func: _indices(Tensor(a) self) -> Tensor(a)
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: _indices_sparse
+  device_check: NoCheck
+  device_guard: False
+
+- func: _values(Tensor(a) self) -> Tensor(a)
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: _values_sparse
+  device_check: NoCheck
+  device_guard: False
+
+# This method doesn't do any check but only directly sets the flag. So it can be
+# a bit unsafe. Similar to _indices and _values, this is useful for implementing
+# custom sparse operations in Python/C++ extension.
+- func: _coalesced_(Tensor(a!) self, bool coalesced) -> Tensor(a!)
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: _coalesced_sparse_
+  device_check: NoCheck
+  device_guard: False
+  autogen: _coalesced, _coalesced.out
+
+- func: indices(Tensor(a) self) -> Tensor(a)
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: indices_sparse
+    CompositeExplicitAutograd: indices_default
+  device_check: NoCheck
+  device_guard: False
+
+- func: values(Tensor(a) self) -> Tensor(a)
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: values_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: values_sparse_csr
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: values_nested
+    CompositeExplicitAutograd: values_default
+  device_check: NoCheck
+  device_guard: False
+
+- func: crow_indices(Tensor(a) self) -> Tensor(a)
+  variants: method
+  dispatch:
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: crow_indices_sparse_csr
+    CompositeExplicitAutograd: crow_indices_default
+  device_check: NoCheck
+  device_guard: False
+
+- func: col_indices(Tensor(a) self) -> Tensor(a)
+  variants: method
+  dispatch:
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: col_indices_sparse_csr
+    CompositeExplicitAutograd: col_indices_default
+  device_check: NoCheck
+  device_guard: False
+
+- func: ccol_indices(Tensor(a) self) -> Tensor(a)
+  variants: method
+  dispatch:
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: ccol_indices_sparse_csr
+    CompositeExplicitAutograd: ccol_indices_default
+  device_check: NoCheck
+  device_guard: False
+
+- func: row_indices(Tensor(a) self) -> Tensor(a)
+  variants: method
+  dispatch:
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: row_indices_sparse_csr
+    CompositeExplicitAutograd: row_indices_default
+  device_check: NoCheck
+  device_guard: False
+
+- func: hspmm.out(Tensor mat1, Tensor mat2, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    SparseCPU: hspmm_out_sparse_cpu
+    SparseCUDA: hspmm_out_sparse_cuda
+
+- func: hspmm(Tensor mat1, Tensor mat2) -> Tensor
+  dispatch:
+    SparseCPU: hspmm_sparse_cpu
+    SparseCUDA: hspmm_sparse_cuda
+
+- func: copy_sparse_to_sparse_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)
+  device_check: NoCheck  # Allows copy into different device
+  variants: function
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS, SparseMeta: copy_sparse_
+  autogen: copy_sparse_to_sparse, copy_sparse_to_sparse.out
+
+# By adding the AutogradNestedTensor this makes this function CompositeImplicit-like for nested tensors
+- func: unbind.int(Tensor(a -> *) self, int dim=0) -> Tensor(a)[]
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: unbind
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_unbind
+
+- func: unbind.Dimname(Tensor(a -> *) self, Dimname dim) -> Tensor(a)[]
+  variants: function, method
+
+- func: to_sparse.sparse_dim(Tensor self, int sparse_dim) -> Tensor
+  variants: method
+
+# Special case of to_sparse.sparse_dim with custom derivative
+- func: _to_sparse.sparse_dim(Tensor self, int sparse_dim) -> Tensor
+  variants: method
+  dispatch:
+    CPU, CUDA, MPS: dense_to_sparse
+    SparseCPU, SparseCUDA, SparseMPS: sparse_coo_to_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta, SparseCsrMPS: sparse_compressed_to_sparse
+  autogen: _to_sparse.sparse_dim_out
+
+- func: to_sparse(Tensor self, *, Layout? layout=None, int[2]? blocksize=None, int? dense_dim=None) -> Tensor
+  variants: method
+
+# Special case of to_sparse with custom derivative
+- func: _to_sparse(Tensor self, *, Layout? layout=None, int[2]? blocksize=None, int? dense_dim=None) -> Tensor
+  variants: method
+  dispatch:
+    CPU, CUDA, MPS: dense_to_sparse
+    SparseCPU, SparseCUDA, SparseMPS: sparse_coo_to_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sparse_compressed_to_sparse
+  autogen: _to_sparse.out
+
+- func: to_sparse_csr(Tensor self, int? dense_dim=None) -> Tensor
+  variants: method
+
+# Special case of to_sparse_csr with custom derivative
+- func: _to_sparse_csr(Tensor self, int? dense_dim=None) -> Tensor
+  variants: method
+  dispatch:
+    CPU, CUDA: dense_to_sparse_csr
+    SparseCPU, SparseCUDA: coo_to_sparse_csr
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sparse_compressed_to_sparse_csr
+  autogen: _to_sparse_csr.out
+
+- func: to_sparse_csc(Tensor self, int? dense_dim=None) -> Tensor
+  variants: method
+
+# Special case of to_sparse_csc with custom derivative
+- func: _to_sparse_csc(Tensor self, int? dense_dim=None) -> Tensor
+  variants: method
+  dispatch:
+    CPU, CUDA: dense_to_sparse_csc
+    SparseCPU, SparseCUDA: coo_to_sparse_csc
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sparse_compressed_to_sparse_csc
+  autogen: _to_sparse_csc.out
+
+- func: to_sparse_bsr(Tensor self, int[2] blocksize, int? dense_dim=None) -> Tensor
+  variants: method
+
+# Special case of to_sparse_bsr with custom derivative
+- func: _to_sparse_bsr(Tensor self, int[2] blocksize, int? dense_dim=None) -> Tensor
+  variants: method
+  dispatch:
+    CPU, CUDA: dense_to_sparse_bsr
+    SparseCPU, SparseCUDA: coo_to_sparse_bsr
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sparse_compressed_to_sparse_bsr
+  autogen: _to_sparse_bsr.out
+
+- func: to_sparse_bsc(Tensor self, int[2] blocksize, int? dense_dim=None) -> Tensor
+  variants: method
+
+# Special case of to_sparse_bsc with custom derivative
+- func: _to_sparse_bsc(Tensor self, int[2] blocksize, int? dense_dim=None) -> Tensor
+  variants: method
+  dispatch:
+    CPU, CUDA: dense_to_sparse_bsc
+    SparseCPU, SparseCUDA: coo_to_sparse_bsc
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sparse_compressed_to_sparse_bsc
+  autogen: _to_sparse_bsc.out
+
+- func: _to_sparse_semi_structured(Tensor dense) -> (Tensor, Tensor)
+  variants: function
+  dispatch:
+    CUDA: _to_sparse_semi_structured
+
+- func: to_mkldnn(Tensor self, ScalarType? dtype=None) -> Tensor
+  variants: method
+  dispatch:
+    CPU: dense_to_mkldnn
+  autogen: to_mkldnn.out
+
+- func: mkldnn_reorder_conv2d_weight(Tensor self, SymInt[2] padding=0, SymInt[2] stride=1, SymInt[2] dilation=1, SymInt groups=1, SymInt[]? input_size=None) -> Tensor
+  variants: function
+  python_module: nn
+  dispatch:
+    MkldnnCPU: mkldnn_reorder_conv2d_weight
+  autogen: mkldnn_reorder_conv2d_weight.out
+
+- func: mkldnn_reorder_conv3d_weight(Tensor self, SymInt[3] padding=0, SymInt[3] stride=1, SymInt[3] dilation=1, SymInt groups=1, SymInt[]? input_size=None) -> Tensor
+  variants: function
+  python_module: nn
+  dispatch:
+    MkldnnCPU: mkldnn_reorder_conv3d_weight
+  autogen: mkldnn_reorder_conv3d_weight.out
+
+- func: to_mkldnn_backward(Tensor grad, Tensor input) -> Tensor
+
+- func: quantize_per_tensor_dynamic(Tensor self, ScalarType dtype, bool reduce_range) -> Tensor
+  variants: function
+  dispatch:
+    CPU, CUDA: quantize_per_tensor_dynamic
+  autogen: quantize_per_tensor_dynamic.out
+
+- func: quantize_per_tensor(Tensor self, float scale, int zero_point, ScalarType dtype) -> Tensor
+  variants: function
+  dispatch:
+    CPU, CUDA: quantize_per_tensor
+  autogen: quantize_per_tensor.out
+
+- func: quantize_per_tensor.tensor_qparams(Tensor self, Tensor scale, Tensor zero_point, ScalarType dtype) -> Tensor
+  variants: function
+  dispatch:
+    CPU, CUDA: quantize_per_tensor_tensor_qparams
+  autogen: quantize_per_tensor.tensor_qparams_out
+
+- func: quantize_per_tensor.tensors(Tensor[] tensors, Tensor scales, Tensor zero_points, ScalarType dtype) -> Tensor[]
+  variants: function
+  dispatch:
+    CPU: quantize_per_tensor_list_cpu
+  autogen: quantize_per_tensor.tensors_out
+
+- func: quantize_per_channel(Tensor self, Tensor scales, Tensor zero_points, int axis, ScalarType dtype) -> Tensor
+  variants: function
+  dispatch:
+    CPU, CUDA: quantize_per_channel
+  autogen: quantize_per_channel.out
+
+- func: dequantize.self(Tensor self) -> Tensor
+  variants: function, method
+  dispatch:
+    CPU, CUDA: dequantize_cpu_or_cuda
+    QuantizedCPU, QuantizedCUDA: dequantize_quantized
+  autogen: dequantize.self_out
+
+- func: dequantize.tensors(Tensor[] tensors) -> Tensor[]
+  variants: function
+  dispatch:
+    QuantizedCPU: dequantize_tensors_quantized_cpu
+  autogen: dequantize.tensors_out
+
+- func: q_scale(Tensor self) -> float
+  variants: function, method
+  dispatch:
+    QuantizedCPU, QuantizedCUDA: q_scale_quant
+
+- func: q_zero_point(Tensor self) -> int
+  variants: function, method
+  dispatch:
+    QuantizedCPU, QuantizedCUDA: q_zero_point_quant
+
+- func: q_per_channel_scales(Tensor self) -> Tensor
+  variants: function, method
+  dispatch:
+    QuantizedCPU, QuantizedCUDA: q_per_channel_scales
+  autogen: q_per_channel_scales.out
+
+- func: q_per_channel_zero_points(Tensor self) -> Tensor
+  variants: function, method
+  dispatch:
+    QuantizedCPU, QuantizedCUDA: q_per_channel_zero_points
+  autogen: q_per_channel_zero_points.out
+
+- func: q_per_channel_axis(Tensor self) -> int
+  variants: function, method
+  dispatch:
+    QuantizedCPU, QuantizedCUDA: q_per_channel_axis
+
+- func: int_repr(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    QuantizedCPU: int_repr_quantized_cpu
+    QuantizedCUDA: int_repr_quantized_cuda
+  autogen: int_repr.out
+
+- func: _make_per_tensor_quantized_tensor(Tensor self, float scale, int zero_point) -> Tensor
+  dispatch:
+    CPU: make_per_tensor_quantized_tensor_cpu
+    CUDA: make_per_tensor_quantized_tensor_cuda
+  autogen: _make_per_tensor_quantized_tensor.out
+
+- func: _make_per_channel_quantized_tensor(Tensor self, Tensor scale, Tensor zero_point, int axis) -> Tensor
+  dispatch:
+    CPU: make_per_channel_quantized_tensor_cpu
+    CUDA: make_per_channel_quantized_tensor_cuda
+  autogen: _make_per_channel_quantized_tensor.out
+
+- func: qscheme(Tensor self) -> QScheme
+  variants: method
+  dispatch:
+    QuantizedCPU, QuantizedCUDA: qscheme_quant
+
+- func: fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+
+- func: fake_quantize_per_tensor_affine.tensor_qparams(Tensor self, Tensor scale, Tensor zero_point, int quant_min, int quant_max) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+
+- func: fake_quantize_per_tensor_affine_cachemask(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor output, Tensor mask)
+  variants: function
+  dispatch:
+    CPU, CUDA: fake_quantize_per_tensor_affine_cachemask
+  autogen: fake_quantize_per_tensor_affine_cachemask.out
+
+- func: _fake_quantize_per_tensor_affine_cachemask_tensor_qparams(Tensor self, Tensor scale, Tensor zero_point, Tensor fake_quant_enabled, int quant_min, int quant_max) -> (Tensor output, Tensor mask)
+  variants: function
+  dispatch:
+    CPU, CUDA: _fake_quantize_per_tensor_affine_cachemask_tensor_qparams
+  autogen: _fake_quantize_per_tensor_affine_cachemask_tensor_qparams.out
+
+- func: fake_quantize_per_tensor_affine_cachemask_backward(Tensor grad, Tensor mask) -> Tensor
+  variants: function
+
+- func: _fake_quantize_learnable_per_tensor_affine(Tensor self, Tensor scale, Tensor zero_point, int quant_min, int quant_max, float grad_factor=1.0) -> Tensor
+  variants: function
+  dispatch:
+    CPU, CUDA: _fake_quantize_learnable_per_tensor_affine
+  autogen: _fake_quantize_learnable_per_tensor_affine.out
+
+- func: _fake_quantize_learnable_per_tensor_affine_backward(Tensor grad, Tensor self, Tensor scale, Tensor zero_point, int quant_min, int quant_max, float grad_factor=1.0) -> (Tensor, Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU, CUDA: _fake_quantize_learnable_per_tensor_affine_backward
+
+- func: fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+
+- func: fake_quantize_per_channel_affine_cachemask(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor output, Tensor mask)
+  variants: function
+  dispatch:
+    CPU, CUDA: fake_quantize_per_channel_affine_cachemask
+  autogen: fake_quantize_per_channel_affine_cachemask.out
+
+- func: fake_quantize_per_channel_affine_cachemask_backward(Tensor grad, Tensor mask) -> Tensor
+  variants: function
+
+- func: _fake_quantize_learnable_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max, float grad_factor=1.0) -> Tensor
+  variants: function
+  dispatch:
+    CPU, CUDA: _fake_quantize_learnable_per_channel_affine
+  autogen: _fake_quantize_learnable_per_channel_affine.out
+
+- func: _fake_quantize_learnable_per_channel_affine_backward(Tensor grad, Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max, float grad_factor=1.0) -> (Tensor, Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU, CUDA: _fake_quantize_learnable_per_channel_affine_backward
+
+- func: fused_moving_avg_obs_fake_quant(Tensor self, Tensor observer_on, Tensor fake_quant_on, Tensor(a!) running_min, Tensor(b!) running_max, Tensor(c!) scale, Tensor(d!) zero_point, float averaging_const, int quant_min, int quant_max, int ch_axis, bool per_row_fake_quant=False, bool symmetric_quant=False) -> Tensor
+  variants: function
+
+- func: _fused_moving_avg_obs_fq_helper(Tensor self, Tensor observer_on, Tensor fake_quant_on, Tensor(a!) running_min, Tensor(b!) running_max, Tensor(c!) scale, Tensor(d!) zero_point, float averaging_const, int quant_min, int quant_max, int ch_axis, bool per_row_fake_quant=False, bool symmetric_quant=False) -> (Tensor output, Tensor mask)
+  dispatch:
+    CPU: fused_moving_avg_obs_fake_quant_cpu
+    CUDA: fused_moving_avg_obs_fake_quant_cuda
+  autogen: _fused_moving_avg_obs_fq_helper_functional, _fused_moving_avg_obs_fq_helper.out
+
+- func: _choose_qparams_per_tensor(Tensor self, bool reduce_range=False) -> (float, int)
+  variants: function
+
+- func: _saturate_weight_to_fp16(Tensor weight) -> Tensor
+  variants: function
+
+- func: choose_qparams_optimized(Tensor input, int numel, int n_bins, float ratio, int bit_width) -> (Tensor, Tensor)
+  variants: function
+
+- func: _autocast_to_reduced_precision(Tensor(a) self, bool cuda_enabled, bool cpu_enabled, ScalarType cuda_dtype, ScalarType cpu_dtype) -> Tensor(a)
+  variants: method
+  device_guard: False
+
+- func: _autocast_to_full_precision(Tensor(a) self, bool cuda_enabled, bool cpu_enabled) -> Tensor(a)
+  variants: method
+  device_guard: False
+
+- func: _to_copy(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool non_blocking=False, MemoryFormat? memory_format=None) -> Tensor
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: _to_copy
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: _to_copy_nested
+  autogen: _to_copy.out
+  tags: core
+
+# to(Device) must not exist because all constructors of Device also works for
+# TensorOptions. Otherwise, an ambiguity error is thrown.
+# See NOTE [ TensorOptions Constructors ].
+- func: to.dtype_layout(Tensor(a) self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool non_blocking=False, bool copy=False, MemoryFormat? memory_format=None) -> Tensor(a)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+
+- func: to.device(Tensor(a) self, Device device, ScalarType dtype, bool non_blocking=False, bool copy=False, MemoryFormat? memory_format=None) -> Tensor(a)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+
+- func: to.dtype(Tensor(a) self, ScalarType dtype, bool non_blocking=False, bool copy=False, MemoryFormat? memory_format=None) -> Tensor(a)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+
+- func: to.other(Tensor(a) self, Tensor other, bool non_blocking=False, bool copy=False, MemoryFormat? memory_format=None) -> Tensor(a)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+
+- func: meshgrid(Tensor[] tensors) -> Tensor[]
+
+# TODO: Two weeks after this lands, combine these two overloads,
+#       making "indexing" optional. These are temporarily distinct for
+#       forward-compatibility reasons.
+- func: meshgrid.indexing(Tensor[] tensors, *, str indexing) -> Tensor[]
+
+- func: cartesian_prod(Tensor[] tensors) -> Tensor
+  variants: function
+  tags: maybe_aliasing_or_mutating
+
+- func: combinations(Tensor self, int r=2, bool with_replacement=False) -> Tensor
+  variants: function
+
+- func: item(Tensor self) -> Scalar
+  tags: data_dependent_output
+  variants: method
+
+- func: result_type.Tensor(Tensor tensor, Tensor other) -> ScalarType
+  variants: function
+
+- func: result_type.Scalar(Tensor tensor, Scalar other) -> ScalarType
+  variants: function
+
+- func: result_type.Scalar_Tensor(Scalar scalar, Tensor tensor) -> ScalarType
+  variants: function
+
+- func: result_type.Scalar_Scalar(Scalar scalar1, Scalar scalar2) -> ScalarType
+
+- func: can_cast(ScalarType from_, ScalarType to) -> bool
+  variants: function
+
+- func: promote_types(ScalarType type1, ScalarType type2) -> ScalarType
+  variants: function
+
+# NB: Does NOT check precondition that numel == 1
+- func: _local_scalar_dense(Tensor self) -> Scalar
+  tags: [core, data_dependent_output]
+  dispatch:
+    CPU: _local_scalar_dense_cpu
+    CUDA: _local_scalar_dense_cuda
+    MPS: _local_scalar_dense_mps
+  variants: function
+
+# MPS LSTM implementation
+
+- func: _lstm_mps(Tensor input, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first) -> (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    MPS: _lstm_mps
+  autogen: _lstm_mps.out
+  tags: nondeterministic_seeded
+
+- func: lstm_mps_backward(Tensor? grad_y, Tensor? grad_hy, Tensor? grad_cy, Tensor z_state, Tensor cell_state_fwd, Tensor input, Tensor layersOutputs, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first) -> (Tensor, Tensor[], Tensor[])
+  dispatch:
+    MPS: lstm_mps_backward
+  autogen: lstm_mps_backward.out
+
+
+# Fused RNN kernels
+- func: _thnn_fused_lstm_cell(Tensor input_gates, Tensor hidden_gates, Tensor cx, Tensor? input_bias=None, Tensor? hidden_bias=None) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: _thnn_fused_lstm_cell_cuda
+  autogen: _thnn_fused_lstm_cell.out
+
+# NB: The composite version of this function below is a simple wrapper that duplicates some of the outputs
+#     It is necessary to avoid triggering TensorImpl use count checks in debug mode
+# NB: this is function is NOT differentiable
+- func: _thnn_fused_lstm_cell_backward_impl(Tensor? grad_hy, Tensor? grad_cy, Tensor cx, Tensor cy, Tensor workspace, bool has_bias) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: _thnn_fused_lstm_cell_backward_impl_cuda
+  autogen: _thnn_fused_lstm_cell_backward_impl.out
+
+- func: _thnn_fused_lstm_cell_backward(Tensor? grad_hy, Tensor? grad_cy, Tensor cx, Tensor cy, Tensor workspace, bool has_bias) -> (Tensor, Tensor, Tensor, Tensor, Tensor)
+
+- func: _thnn_differentiable_lstm_cell_backward(Tensor? grad_hy, Tensor? grad_cy, Tensor input_gates, Tensor hidden_gates, Tensor? input_bias, Tensor? hidden_bias, Tensor cx, Tensor cy) -> (Tensor, Tensor, Tensor, Tensor, Tensor)
+
+- func: _thnn_fused_gru_cell(Tensor input_gates, Tensor hidden_gates, Tensor hx, Tensor? input_bias=None, Tensor? hidden_bias=None) -> (Tensor, Tensor)
+  dispatch:
+    CUDA: _thnn_fused_gru_cell_cuda
+  autogen: _thnn_fused_gru_cell.out
+
+- func: _thnn_fused_gru_cell_backward(Tensor grad_hy, Tensor workspace, bool has_bias) -> (Tensor, Tensor, Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: _thnn_fused_gru_cell_backward_cuda
+  autogen: _thnn_fused_gru_cell_backward.out
+
+- func: _thnn_differentiable_gru_cell_backward(Tensor grad_hy, Tensor input_gates, Tensor hidden_gates, Tensor hx, Tensor? input_bias, Tensor? hidden_bias) -> (Tensor, Tensor, Tensor, Tensor, Tensor)
+
+# RNN cells and layers
+- func: lstm.input(Tensor input, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first) -> (Tensor, Tensor, Tensor)
+  tags: nondeterministic_seeded
+
+- func: lstm.data(Tensor data, Tensor batch_sizes, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional) -> (Tensor, Tensor, Tensor)
+  tags: nondeterministic_seeded
+
+- func: gru.input(Tensor input, Tensor hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first) -> (Tensor, Tensor)
+  tags: nondeterministic_seeded
+
+- func: gru.data(Tensor data, Tensor batch_sizes, Tensor hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional) -> (Tensor, Tensor)
+  tags: nondeterministic_seeded
+
+- func: rnn_tanh.input(Tensor input, Tensor hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first) -> (Tensor, Tensor)
+  tags: nondeterministic_seeded
+
+- func: rnn_tanh.data(Tensor data, Tensor batch_sizes, Tensor hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional) -> (Tensor, Tensor)
+  tags: nondeterministic_seeded
+
+- func: rnn_relu.input(Tensor input, Tensor hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first) -> (Tensor, Tensor)
+  tags: nondeterministic_seeded
+
+- func: rnn_relu.data(Tensor data, Tensor batch_sizes, Tensor hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional) -> (Tensor, Tensor)
+  tags: nondeterministic_seeded
+
+- func: lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
+
+- func: gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> Tensor
+
+- func: rnn_tanh_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> Tensor
+
+- func: rnn_relu_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> Tensor
+
+# Quantized RNN layer registration has been moved to C10 dispatch in `RNN.cpp`
+
+# Quantized RNN layers
+# - func: quantized_lstm(Tensor input, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first, *, ScalarType? dtype=None, bool use_dynamic=False) -> (Tensor, Tensor, Tensor)
+
+
+# - func: quantized_lstm.data(Tensor data, Tensor batch_sizes, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, *, ScalarType? dtype=None, bool use_dynamic=False) -> (Tensor, Tensor, Tensor)
+
+
+# Quantized GRU layers
+
+# - func: quantized_gru.input(Tensor input, Tensor hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first) -> (Tensor, Tensor)
+#
+
+# - func: quantized_gru.data(Tensor data, Tensor batch_sizes, Tensor hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional) -> (Tensor, Tensor)
+#
+
+# Quantized RNN cells
+- func: quantized_lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor b_ih, Tensor b_hh, Tensor packed_ih, Tensor packed_hh, Tensor col_offsets_ih, Tensor col_offsets_hh, Scalar scale_ih, Scalar scale_hh, Scalar zero_point_ih, Scalar zero_point_hh) -> (Tensor, Tensor)
+
+- func: quantized_gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor b_ih, Tensor b_hh, Tensor packed_ih, Tensor packed_hh, Tensor col_offsets_ih, Tensor col_offsets_hh, Scalar scale_ih, Scalar scale_hh, Scalar zero_point_ih, Scalar zero_point_hh) -> Tensor
+
+- func: quantized_rnn_relu_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor b_ih, Tensor b_hh, Tensor packed_ih, Tensor packed_hh, Tensor col_offsets_ih, Tensor col_offsets_hh, Scalar scale_ih, Scalar scale_hh, Scalar zero_point_ih, Scalar zero_point_hh) -> Tensor
+
+- func: quantized_rnn_tanh_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor b_ih, Tensor b_hh, Tensor packed_ih, Tensor packed_hh, Tensor col_offsets_ih, Tensor col_offsets_hh, Scalar scale_ih, Scalar scale_hh, Scalar zero_point_ih, Scalar zero_point_hh) -> Tensor
+
+# PackedSequence utilities
+- func: _pack_padded_sequence(Tensor input, Tensor lengths, bool batch_first) -> (Tensor, Tensor)
+  dispatch:
+    CompositeExplicitAutograd: _pack_padded_sequence
+  autogen: _pack_padded_sequence.out
+
+- func: _pack_padded_sequence_backward(Tensor grad, SymInt[] input_size, Tensor batch_sizes, bool batch_first) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd: _pack_padded_sequence_backward_symint
+
+- func: _pad_packed_sequence(Tensor data, Tensor batch_sizes, bool batch_first, Scalar padding_value, int total_length) -> (Tensor, Tensor)
+
+# wrappers for legacy TH methods
+
+- func: set_.source_Storage(Tensor(a!) self, Storage source) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CPU, CUDA, Meta, MPS: set_
+  autogen: set.source_Storage, set.source_Storage_out
+  tags: inplace_view
+
+- func: set_.source_Storage_storage_offset(Tensor(a!) self, Storage source, SymInt storage_offset, SymInt[] size, SymInt[] stride=[]) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CPU: set_storage_cpu_
+    Meta: set_storage_meta__symint
+    CUDA: set_storage_cuda_
+    MPS: set_storage_mps_
+    QuantizedCPU, QuantizedCUDA: set_storage_quantized_
+  autogen: set.source_Storage_storage_offset, set.source_Storage_storage_offset_out
+  tags: inplace_view
+
+- func: set_.source_Tensor_storage_offset(Tensor(a!) self, Tensor source, SymInt storage_offset, SymInt[] size, SymInt[] stride=[]) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: set__symint
+  tags: inplace_view
+
+- func: set_.source_Tensor(Tensor(a!) self, Tensor source) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CPU, CUDA, Meta, MPS: set_tensor_
+  autogen: set.source_Tensor, set.source_Tensor_out
+  tags: inplace_view
+
+- func: set_(Tensor(a!) self) -> Tensor(a!)
+  variants: method
+  dispatch:
+    CPU: set_cpu_
+    CUDA: set_cuda_
+    Meta: set_meta_
+    MPS: set_mps_
+  autogen: set, set.out
+  tags: inplace_view
+
+# Not making it CompositeImplicitAutograd because lift
+# should be a primitive w.r.t. functorch
+
+# TODO: this should have a view annotation
+# TODO: shouldn't be a method
+- func: lift(Tensor self) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: lift
+  autogen: lift.out
+
+# lift_fresh is called with an argument that is guaranteed to be
+# fresh (i.e., newly allocated).  This is ONLY called from a
+# torch.tensor call; if you FX trace a lift_fresh, you are obligated
+# to convert this into a lift_fresh_copy (because FX will violate the
+# freshness invariant when tracing).
+- func: lift_fresh(Tensor(a) self) -> Tensor(a)
+  dispatch:
+    CompositeExplicitAutograd: lift_fresh
+
+# Like lift, but it clones the input.
+- func: lift_fresh_copy(Tensor self) -> Tensor
+  tags: view_copy
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: lift_fresh_copy
+  autogen: lift_fresh_copy.out
+
+- func: is_set_to(Tensor self, Tensor tensor) -> bool
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CPU, CUDA, MPS: is_set_to
+
+- func: masked_fill_.Scalar(Tensor(a!) self, Tensor mask, Scalar value) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CPU: masked_fill__cpu
+    CUDA: masked_fill__cuda
+    QuantizedCPU: masked_fill__quantized_cpu
+    QuantizedCUDA: masked_fill__quantized_cuda
+    MPS: masked_fill__mps
+  autogen: masked_fill.Scalar_out
+
+- func: masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: masked_fill
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_masked_fill
+  tags: pointwise
+
+- func: masked_fill_.Tensor(Tensor(a!) self, Tensor mask, Tensor value) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CPU: masked_fill__cpu
+    CUDA: masked_fill__cuda
+    QuantizedCPU: masked_fill__quantized_cpu
+    QuantizedCUDA: masked_fill__quantized_cuda
+    MPS: masked_fill__mps
+  autogen: masked_fill.Tensor_out
+
+- func: masked_fill.Tensor(Tensor self, Tensor mask, Tensor value) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: masked_fill
+
+- func: masked_scatter_(Tensor(a!) self, Tensor mask, Tensor source) -> Tensor(a!)
+  variants: method
+  dispatch:
+    CPU: masked_scatter__cpu
+    CUDA: masked_scatter__cuda
+    MPS: masked_scatter__mps
+  autogen: masked_scatter.out
+
+- func: masked_scatter(Tensor self, Tensor mask, Tensor source) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: masked_scatter
+  tags: core
+
+- func: masked_scatter_backward(Tensor grad_output, Tensor mask, SymInt[] sizes) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: masked_scatter_backward_symint
+
+- func: _masked_softmax(Tensor self, Tensor mask, int? dim=None, int? mask_type=None) -> Tensor
+  dispatch:
+    CUDA: masked_softmax_cuda
+    CPU: masked_softmax_cpu
+  autogen: _masked_softmax.out
+
+- func: _masked_softmax_backward(Tensor grad_output, Tensor output, Tensor mask, int? dim=None) -> Tensor
+  dispatch:
+    CUDA: masked_softmax_backward_cuda
+    CPU: masked_softmax_backward_cpu
+  autogen: _masked_softmax_backward.out
+
+- func: view(Tensor(a) self, SymInt[] size) -> Tensor(a)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    ZeroTensor, Meta, CPU, CUDA, QuantizedCPU, QuantizedCUDA, MPS, MTIA: view
+    MkldnnCPU: mkldnn_view
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: view_nested
+  tags: core
+
+# Warning: If you want to change the name or overload name of this
+# operator, you might also want to change the `isBlockListedSchema`
+# function in `torch/csrc/jit/frontend/schema_catching.cpp`.
+# The name and overload name of this operator is hardcoded in that
+# function in order to workaround a bug:
+# https://github.com/pytorch/pytorch/issues/47964
+- func: view.dtype(Tensor(a) self, ScalarType dtype) -> Tensor(a)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: view_dtype
+
+- func: put_(Tensor(a!) self, Tensor index, Tensor source, bool accumulate=False) -> Tensor(a!)
+  variants: method
+  dispatch:
+    CPU, CUDA: put_
+  autogen: put.out
+
+- func: put(Tensor self, Tensor index, Tensor source, bool accumulate=False) -> Tensor
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: put
+
+- func: index_add.out(Tensor self, int dim, Tensor index, Tensor source, *, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  precomputed:
+  - dim -> int dim
+  dispatch:
+    CPU: index_add_cpu_out
+    CUDA: index_add_cuda_out
+    MPS: index_add_mps_out
+
+- func: index_add_(Tensor(a!) self, int dim, Tensor index, Tensor source, *, Scalar alpha=1) -> Tensor(a!)
+  structured_delegate: index_add.out
+  variants: method
+
+- func: index_add(Tensor self, int dim, Tensor index, Tensor source, *, Scalar alpha=1) -> Tensor
+  structured_delegate: index_add.out
+  variants: function, method
+
+- func: index_add.dimname(Tensor self, Dimname dim, Tensor index, Tensor source, *, Scalar alpha=1) -> Tensor
+  variants: function, method
+
+- func: index_reduce.out(Tensor self, int dim, Tensor index, Tensor source, str reduce, *, bool include_self=True, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  precomputed:
+  - dim -> int dim
+  dispatch:
+    CPU: index_reduce_cpu_out
+    CUDA: index_reduce_cuda_out
+
+- func: index_reduce_(Tensor(a!) self, int dim, Tensor index, Tensor source, str reduce, *, bool include_self=True) -> Tensor(a!)
+  structured_delegate: index_reduce.out
+  variants: method
+
+- func: index_reduce(Tensor self, int dim, Tensor index, Tensor source, str reduce, *, bool include_self=True) -> Tensor
+  structured_delegate: index_reduce.out
+  variants: function, method
+
+- func: index_fill_.int_Scalar(Tensor(a!) self, int dim, Tensor index, Scalar value) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CPU: index_fill_
+    CUDA: index_fill_
+    MPS: index_fill_mps_
+  autogen: index_fill.int_Scalar_out
+
+- func: index_fill.int_Scalar(Tensor self, int dim, Tensor index, Scalar value) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: index_fill
+
+- func: index_fill_.int_Tensor(Tensor(a!) self, int dim, Tensor index, Tensor value) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CPU, CUDA: index_fill_
+    MPS: index_fill_mps_
+  autogen: index_fill.int_Tensor_out
+
+- func: index_fill.int_Tensor(Tensor self, int dim, Tensor index, Tensor value) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  dispatch:
+    CompositeExplicitAutograd: index_fill
+
+- func: index_fill_.Dimname_Scalar(Tensor(a!) self, Dimname dim, Tensor index, Scalar value) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: index_fill_.Dimname_Tensor(Tensor(a!) self, Dimname dim, Tensor index, Tensor value) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: index_fill.Dimname_Scalar(Tensor self, Dimname dim, Tensor index, Scalar value) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+
+- func: index_fill.Dimname_Tensor(Tensor self, Dimname dim, Tensor index, Tensor value) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+
+- func: scatter.src(Tensor self, int dim, Tensor index, Tensor src) -> Tensor
+  structured_delegate: scatter.src_out
+  variants: function, method
+  tags: core
+
+- func: scatter_.src(Tensor(a!) self, int dim, Tensor index, Tensor src) -> Tensor(a!)
+  structured_delegate: scatter.src_out
+  variants: method
+
+- func: scatter.src_out(Tensor self, int dim, Tensor index, Tensor src, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  dispatch:
+    CPU, CUDA: scatter_src_out
+    MPS: scatter_src_out_mps
+
+- func: scatter.value(Tensor self, int dim, Tensor index, Scalar value) -> Tensor
+  structured_delegate: scatter.value_out
+  variants: function, method
+  tags: core
+
+- func: scatter_.value(Tensor(a!) self, int dim, Tensor index, Scalar value) -> Tensor(a!)
+  structured_delegate: scatter.value_out
+  variants: method
+
+- func: scatter.value_out(Tensor self, int dim, Tensor index, Scalar value, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  dispatch:
+    CPU, CUDA: scatter_value_out
+    MPS: scatter_value_out_mps
+
+- func: scatter.reduce(Tensor self, int dim, Tensor index, Tensor src, *, str reduce) -> Tensor
+  structured_delegate: scatter.reduce_out
+  variants: function, method
+
+- func: scatter_.reduce(Tensor(a!) self, int dim, Tensor index, Tensor src, *, str reduce) -> Tensor(a!)
+  structured_delegate: scatter.reduce_out
+  variants: method
+
+- func: scatter.reduce_out(Tensor self, int dim, Tensor index, Tensor src, *, str reduce, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  dispatch:
+    CPU, CUDA: scatter_reduce_out
+    MPS: scatter_reduce_out_mps
+
+- func: scatter.value_reduce(Tensor self, int dim, Tensor index, Scalar value, *, str reduce) -> Tensor
+  structured_delegate: scatter.value_reduce_out
+  variants: function, method
+
+- func: scatter_.value_reduce(Tensor(a!) self, int dim, Tensor index, Scalar value, *, str reduce) -> Tensor(a!)
+  structured_delegate: scatter.value_reduce_out
+  variants: method
+
+- func: scatter.value_reduce_out(Tensor self, int dim, Tensor index, Scalar value, *, str reduce, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  dispatch:
+    CPU, CUDA: scatter_value_reduce_out
+    MPS: scatter_value_reduce_out_mps
+
+- func: scatter.dimname_src(Tensor self, Dimname dim, Tensor index, Tensor src) -> Tensor
+  variants: function, method
+
+- func: scatter.dimname_value(Tensor self, Dimname dim, Tensor index, Scalar value) -> Tensor
+  variants: function, method
+
+- func: scatter_add(Tensor self, int dim, Tensor index, Tensor src) -> Tensor
+  structured_delegate: scatter_add.out
+  variants: function, method
+  tags: core
+
+- func: scatter_add_(Tensor(a!) self, int dim, Tensor index, Tensor src) -> Tensor(a!)
+  structured_delegate: scatter_add.out
+  variants: method
+
+- func: scatter_add.out(Tensor self, int dim, Tensor index, Tensor src, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  dispatch:
+    CPU, CUDA: scatter_add
+    MPS: scatter_add_mps_out
+
+- func: scatter_add.dimname(Tensor self, Dimname dim, Tensor index, Tensor src) -> Tensor
+  variants: function, method
+
+- func: scatter_reduce.two(Tensor self, int dim, Tensor index, Tensor src, str reduce, *, bool include_self=True) -> Tensor
+  structured_delegate: scatter_reduce.two_out
+  variants: function, method
+  tags: core
+
+- func: scatter_reduce_.two(Tensor(a!) self, int dim, Tensor index, Tensor src, str reduce, *, bool include_self=True) -> Tensor(a!)
+  structured_delegate: scatter_reduce.two_out
+  variants: method
+
+- func: scatter_reduce.two_out(Tensor self, int dim, Tensor index, Tensor src, str reduce, *, bool include_self=True, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS: scatter_reduce_two
+
+- func: eq_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  structured_delegate: eq.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: eq_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: eq.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: bitwise_and.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  variants: function
+  dispatch:
+    CPU, CUDA, MTIA: bitwise_and_out
+    MPS: bitwise_and_out_mps
+  tags: pointwise
+
+- func: bitwise_and.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_and_out
+  tags: pointwise
+
+- func: bitwise_and.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_and
+  tags: [core, pointwise]
+
+- func: bitwise_and.Scalar_Tensor(Scalar self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_and
+  autogen: bitwise_and.Scalar_Tensor_out
+  tags: pointwise
+
+- func: bitwise_and.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  structured_delegate: bitwise_and.Tensor_out
+  tags: [core, pointwise]
+
+- func: bitwise_and_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: bitwise_and_
+  tags: pointwise
+
+- func: bitwise_and_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: bitwise_and.Tensor_out
+  tags: pointwise
+
+- func: __and__.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+
+- func: __and__.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+
+- func: __iand__.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: __iand__.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: bitwise_or.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  variants: function
+  dispatch:
+    CPU, CUDA, MTIA: bitwise_or_out
+    MPS: bitwise_or_out_mps
+  tags: pointwise
+
+- func: bitwise_or.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_or_out
+  tags: pointwise
+
+- func: bitwise_or.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_or
+  tags: [core, pointwise]
+
+- func: bitwise_or.Scalar_Tensor(Scalar self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_or
+  autogen: bitwise_or.Scalar_Tensor_out
+  tags: pointwise
+
+- func: bitwise_or.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  structured_delegate: bitwise_or.Tensor_out
+  tags: [core, pointwise]
+
+- func: bitwise_or_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: bitwise_or_
+  tags: pointwise
+
+- func: bitwise_or_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: bitwise_or.Tensor_out
+  tags: pointwise
+
+- func: __or__.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+
+- func: __or__.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+
+- func: __ior__.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: __ior__.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: bitwise_xor.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  variants: function
+  dispatch:
+    CPU, CUDA: bitwise_xor_out
+    MPS: bitwise_xor_out_mps
+  tags: pointwise
+
+- func: bitwise_xor.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_xor_out
+  tags: pointwise
+
+- func: bitwise_xor.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_xor
+  tags: [core, pointwise]
+
+- func: bitwise_xor.Scalar_Tensor(Scalar self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_xor
+  autogen: bitwise_xor.Scalar_Tensor_out
+  tags: pointwise
+
+- func: bitwise_xor.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  structured_delegate: bitwise_xor.Tensor_out
+  tags: [core, pointwise]
+
+- func: bitwise_xor_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: bitwise_xor_
+  tags: pointwise
+
+- func: bitwise_xor_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: bitwise_xor.Tensor_out
+  tags: pointwise
+
+- func: __xor__.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  tags: pointwise
+
+- func: __xor__.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  tags: pointwise
+
+- func: __ixor__.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  tags: pointwise
+
+- func: __ixor__.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  tags: pointwise
+
+- func: __lshift__.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CPU, CUDA, MPS: __lshift__
+  tags: pointwise
+
+- func: __lshift__.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CPU, CUDA, MPS: __lshift__
+  tags: pointwise
+
+- func: __ilshift__.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CPU, CUDA, MPS: __ilshift__
+  autogen: __lshift__.Scalar_out
+  tags: pointwise
+
+- func: __ilshift__.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CPU, CUDA, MPS: __ilshift__
+  autogen: __lshift__.Tensor_out
+  tags: pointwise
+
+- func: bitwise_left_shift.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: bitwise_left_shift.Tensor_out
+  tags: pointwise
+
+- func: bitwise_left_shift_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: bitwise_left_shift.Tensor_out
+  tags: pointwise
+
+- func: bitwise_left_shift.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: bitwise_left_shift_out
+  tags: pointwise
+
+- func: bitwise_left_shift.Tensor_Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_left_shift
+  tags: pointwise
+
+- func: bitwise_left_shift_.Tensor_Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: bitwise_left_shift_
+  tags: pointwise
+
+- func: bitwise_left_shift.Tensor_Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_left_shift_out
+  tags: pointwise
+
+- func: bitwise_left_shift.Scalar_Tensor(Scalar self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_left_shift
+  autogen: bitwise_left_shift.Scalar_Tensor_out
+  tags: pointwise
+
+- func: __rshift__.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CPU, CUDA, MPS: __rshift__
+  tags: pointwise
+
+- func: __rshift__.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CPU, CUDA, MPS: __rshift__
+  tags: pointwise
+
+- func: __irshift__.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CPU, CUDA, MPS: __irshift__
+  autogen: __rshift__.Scalar_out
+
+- func: __irshift__.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CPU, CUDA, MPS: __irshift__
+  autogen: __rshift__.Tensor_out
+
+- func: bitwise_right_shift.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function, method
+  structured_delegate: bitwise_right_shift.Tensor_out
+  tags: pointwise
+
+- func: bitwise_right_shift_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: bitwise_right_shift.Tensor_out
+  tags: pointwise
+
+- func: bitwise_right_shift.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: bitwise_right_shift_out
+  tags: pointwise
+
+- func: bitwise_right_shift.Tensor_Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_right_shift
+  tags: pointwise
+
+- func: bitwise_right_shift_.Tensor_Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: bitwise_right_shift_
+  tags: pointwise
+
+- func: bitwise_right_shift.Tensor_Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_right_shift_out
+  tags: pointwise
+
+- func: bitwise_right_shift.Scalar_Tensor(Scalar self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: bitwise_right_shift
+  autogen: bitwise_right_shift.Scalar_Tensor_out
+  tags: pointwise
+
+- func: tril_(Tensor(a!) self, SymInt diagonal=0) -> Tensor(a!)
+  structured_delegate: tril.out
+  variants: method
+
+- func: triu_(Tensor(a!) self, SymInt diagonal=0) -> Tensor(a!)
+  structured_delegate: triu.out
+  variants: method
+
+- func: digamma_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: digamma.out
+  variants: method
+  tags: pointwise
+
+- func: lerp_.Scalar(Tensor(a!) self, Tensor end, Scalar weight) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: lerp.Scalar_out
+  tags: pointwise
+
+- func: lerp_.Tensor(Tensor(a!) self, Tensor end, Tensor weight) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: lerp.Tensor_out
+  tags: pointwise
+
+- func: addbmm_(Tensor(a!) self, Tensor batch1, Tensor batch2, *, Scalar beta=1, Scalar alpha=1) -> Tensor(a!)
+  variants: method
+  dispatch:
+    CPU, CUDA, XPU: addbmm_
+    MPS: addbmm_mps_
+
+- func: addbmm.out(Tensor self, Tensor batch1, Tensor batch2, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, XPU: addbmm_out
+    MPS: addbmm_out_mps
+
+- func: addbmm(Tensor self, Tensor batch1, Tensor batch2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU, CUDA, XPU: addbmm
+    MPS: addbmm_mps
+
+- func: random_.from(Tensor(a!) self, int from, int? to, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  tags: nondeterministic_seeded
+  dispatch:
+    CPU, CUDA: random_
+    Meta: random_meta_
+    MPS: random_mps_
+  autogen: random.from, random.from_out
+
+- func: random_.to(Tensor(a!) self, int to, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: nondeterministic_seeded
+  variants: method
+  dispatch:
+    CPU, CUDA: random_
+    Meta: random_meta_
+    MPS: random_mps_
+  autogen: random.to, random.to_out
+
+- func: random_(Tensor(a!) self, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: nondeterministic_seeded
+  variants: method
+  dispatch:
+    CPU, CUDA: random_
+    MPS: random_mps_
+    Meta: random_meta_
+  autogen: random, random.out
+
+- func: uniform_(Tensor(a!) self, float from=0, float to=1, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: nondeterministic_seeded
+  variants: method
+  dispatch:
+    CPU, CUDA: uniform_
+    MPS: uniform_mps_
+    Meta: uniform_meta_
+  autogen: uniform, uniform.out
+
+- func: cauchy_(Tensor(a!) self, float median=0, float sigma=1, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  tags: nondeterministic_seeded
+  dispatch:
+    CPU, CUDA: cauchy_
+  autogen: cauchy, cauchy.out
+
+- func: log_normal_(Tensor(a!) self, float mean=1, float std=2, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: nondeterministic_seeded
+  variants: method
+  dispatch:
+    CPU, CUDA: log_normal_
+  autogen: log_normal, log_normal.out
+
+- func: exponential_(Tensor(a!) self, float lambd=1, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: nondeterministic_seeded
+  variants: method
+  dispatch:
+    CPU, CUDA: exponential_
+    MPS: exponential_mps_
+  autogen: exponential, exponential.out
+
+- func: geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: nondeterministic_seeded
+  variants: method
+  dispatch:
+    CPU, CUDA: geometric_
+
+  # wrappers for TH functions
+  autogen: geometric, geometric.out
+
+- func: diag.out(Tensor self, int diagonal=0, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: diag(Tensor self, int diagonal=0) -> Tensor
+  variants: method, function
+
+- func: cross.out(Tensor self, Tensor other, int? dim=None, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: cross(Tensor self, Tensor other, int? dim=None) -> Tensor
+  variants: method, function
+
+- func: triu.out(Tensor self, SymInt diagonal=0, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: triu_cpu
+    CUDA: triu_cuda
+    MPS: triu_mps_out
+
+- func: triu(Tensor self, SymInt diagonal=0) -> Tensor
+  structured_delegate: triu.out
+  variants: method, function
+
+- func: tril.out(Tensor self, SymInt diagonal=0, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: tril_cpu
+    CUDA: tril_cuda
+    MPS: tril_mps_out
+
+- func: tril(Tensor self, SymInt diagonal=0) -> Tensor
+  structured_delegate: tril.out
+  variants: method, function
+
+- func: tril_indices(int row, int col, int offset=0, *, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CPU: tril_indices_cpu
+    CUDA: tril_indices_cuda
+    MPS: tril_indices_mps
+  autogen: tril_indices.out
+
+- func: triu_indices(int row, int col, int offset=0, *, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CPU: triu_indices_cpu
+    CUDA: triu_indices_cuda
+    MPS: triu_indices_mps
+  autogen: triu_indices.out
+
+- func: trace(Tensor self) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU: trace_cpu
+    CUDA: trace_cuda
+    MPS: trace_mps
+  autogen: trace.out
+
+- func: trace_backward(Tensor grad, SymInt[] sizes) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: trace_backward_symint
+
+- func: ne.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: ne_Scalar_out
+    MPS: ne_scalar_out_mps
+    QuantizedCPU: ne_out_quantized_cpu
+  tags: pointwise
+
+- func: ne.Scalar(Tensor self, Scalar other) -> Tensor
+  structured_delegate: ne.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: ne_quantized_cpu
+  tags: [core, pointwise]
+
+- func: ne.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: ne_Tensor_out
+    MPS: ne_tensor_out_mps
+    QuantizedCPU: ne_out_quantized_cpu
+  tags: pointwise
+
+- func: ne.Tensor(Tensor self, Tensor other) -> Tensor
+  structured_delegate: ne.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: ne_quantized_cpu
+  tags: [core, pointwise]
+
+- func: ne_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  structured_delegate: ne.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: ne_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: ne.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+# not_equal, alias for torch.ne
+- func: not_equal.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: not_equal.Scalar(Tensor self, Scalar other) -> Tensor
+  variants: method, function
+
+- func: not_equal.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: not_equal.Tensor(Tensor self, Tensor other) -> Tensor
+  variants: method, function
+
+- func: not_equal_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  variants: method
+
+- func: not_equal_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  variants: method
+
+- func: eq.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: eq_Scalar_out
+    MPS: eq_scalar_out_mps
+    QuantizedCPU: eq_out_quantized_cpu
+  tags: pointwise
+
+- func: eq.Scalar(Tensor self, Scalar other) -> Tensor
+  structured_delegate: eq.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: eq_quantized_cpu
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: eq_scalar_nested
+  tags: [core, pointwise]
+
+- func: eq.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: eq_Tensor_out
+    MPS: eq_tensor_out_mps
+    QuantizedCPU: eq_out_quantized_cpu
+  tags: pointwise
+
+- func: eq.Tensor(Tensor self, Tensor other) -> Tensor
+  structured_delegate: eq.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: eq_quantized_cpu
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: eq_tensor_nested
+  tags: [core, pointwise]
+
+- func: ge.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: ge_Scalar_out
+    MPS: ge_scalar_out_mps
+    QuantizedCPU: ge_out_quantized_cpu
+  tags: pointwise
+
+- func: ge.Scalar(Tensor self, Scalar other) -> Tensor
+  structured_delegate: ge.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: ge_quantized_cpu
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: ge_scalar_nested
+  tags: [core, pointwise]
+
+- func: ge.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: ge_Tensor_out
+    MPS: ge_tensor_out_mps
+    QuantizedCPU: ge_out_quantized_cpu
+  tags: pointwise
+
+- func: ge.Tensor(Tensor self, Tensor other) -> Tensor
+  structured_delegate: ge.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: ge_quantized_cpu
+  tags: [core, pointwise]
+
+- func: ge_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  structured_delegate: ge.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: ge_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: ge.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+# greater_equal, alias for torch.ge
+- func: greater_equal.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: greater_equal.Scalar(Tensor self, Scalar other) -> Tensor
+  variants: method, function
+
+- func: greater_equal.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: greater_equal.Tensor(Tensor self, Tensor other) -> Tensor
+  variants: method, function
+
+- func: greater_equal_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  variants: method
+
+- func: greater_equal_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  variants: method
+
+- func: le.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: le_Scalar_out
+    MPS: le_scalar_out_mps
+    QuantizedCPU: le_out_quantized_cpu
+  tags: pointwise
+
+- func: le.Scalar(Tensor self, Scalar other) -> Tensor
+  structured_delegate: le.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: le_quantized_cpu
+  tags: [core, pointwise]
+
+- func: le.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: le_Tensor_out
+    MPS: le_tensor_out_mps
+    QuantizedCPU: le_out_quantized_cpu
+  tags: pointwise
+
+- func: le.Tensor(Tensor self, Tensor other) -> Tensor
+  structured_delegate: le.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: le_quantized_cpu
+  tags: [core, pointwise]
+
+- func: le_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  structured_delegate: le.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: le_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: le.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+# less_equal, alias for torch.le
+- func: less_equal.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: less_equal.Scalar(Tensor self, Scalar other) -> Tensor
+  variants: method, function
+
+- func: less_equal.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: less_equal.Tensor(Tensor self, Tensor other) -> Tensor
+  variants: method, function
+
+- func: less_equal_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  variants: method
+
+- func: less_equal_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  variants: method
+
+- func: gt.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA,MTIA: gt_Scalar_out
+    MPS: gt_scalar_out_mps
+    QuantizedCPU: gt_out_quantized_cpu
+  tags: pointwise
+
+- func: gt.Scalar(Tensor self, Scalar other) -> Tensor
+  structured_delegate: gt.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: gt_quantized_cpu
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: gt_scalar_nested
+  tags: [core, pointwise]
+
+- func: gt.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: gt_Tensor_out
+    MPS: gt_tensor_out_mps
+    QuantizedCPU: gt_out_quantized_cpu
+  tags: pointwise
+
+- func: gt.Tensor(Tensor self, Tensor other) -> Tensor
+  structured_delegate: gt.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: gt_quantized_cpu
+  tags: [core, pointwise]
+
+- func: gt_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  structured_delegate: gt.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: gt_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: gt.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+#  greater, alias for torch.gt
+- func: greater.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: greater.Scalar(Tensor self, Scalar other) -> Tensor
+  variants: method, function
+
+- func: greater.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: greater.Tensor(Tensor self, Tensor other) -> Tensor
+  variants: method, function
+
+- func: greater_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  variants: method
+
+- func: greater_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  variants: method
+
+- func: lt.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: lt_Scalar_out
+    MPS: lt_scalar_out_mps
+    QuantizedCPU: lt_out_quantized_cpu
+  tags: pointwise
+
+- func: lt.Scalar(Tensor self, Scalar other) -> Tensor
+  structured_delegate: lt.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: lt_quantized_cpu
+  tags: [core, pointwise]
+
+- func: lt.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: lt_Tensor_out
+    MPS: lt_tensor_out_mps
+    QuantizedCPU: lt_out_quantized_cpu
+  tags: pointwise
+
+- func: lt.Tensor(Tensor self, Tensor other) -> Tensor
+  structured_delegate: lt.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    QuantizedCPU: lt_quantized_cpu
+  tags: [core, pointwise]
+
+- func: lt_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  structured_delegate: lt.Scalar_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+- func: lt_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: lt.Tensor_out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+
+#  less, alias for torch.lt
+- func: less.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: less.Scalar(Tensor self, Scalar other) -> Tensor
+  variants: method, function
+
+- func: less.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: less.Tensor(Tensor self, Tensor other) -> Tensor
+  variants: method, function
+
+- func: less_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  variants: method
+
+- func: less_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  variants: method
+
+- func: take.out(Tensor self, Tensor index, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: take_out
+
+- func: take(Tensor self, Tensor index) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU, CUDA: take
+
+- func: take_along_dim.out(Tensor self, Tensor indices, int? dim=None, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: take_along_dim(Tensor self, Tensor indices, int? dim=None) -> Tensor
+  variants: method, function
+
+- func: index_select.out(Tensor self, int dim, Tensor index, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, QuantizedCPU: index_select_out_cpu_
+    CUDA, QuantizedCUDA: index_select_out_cuda
+    MPS: index_select_out_mps
+
+- func: index_select(Tensor self, int dim, Tensor index) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU: index_select_cpu_
+    QuantizedCPU: index_select_quantized_cpu_
+    CUDA: index_select_cuda
+    QuantizedCUDA: index_select_quantized_cuda
+    SparseCPU: index_select_sparse_cpu
+    SparseCUDA: index_select_sparse_cuda
+    SparseMPS: index_select_sparse_mps
+    MPS: index_select_mps
+  tags: core
+
+- func: index_select.dimname_out(Tensor self, Dimname dim, Tensor index, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: index_select.dimname(Tensor self, Dimname dim, Tensor index) -> Tensor
+  variants: method, function
+
+- func: index_select_backward(Tensor grad, SymInt[] self_sizes, int dim, Tensor index) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeImplicitAutograd: index_select_backward_symint
+
+- func: masked_select.out(Tensor self, Tensor mask, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU: masked_select_out_cpu
+    CUDA: masked_select_out_cuda
+    MPS: masked_select_out_mps
+  tags: dynamic_output_shape
+
+- func: masked_select(Tensor self, Tensor mask) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU: masked_select_cpu
+    CUDA: masked_select_cuda
+    MPS: masked_select_mps
+  tags: dynamic_output_shape
+
+- func: masked_select_backward(Tensor grad, Tensor input, Tensor mask) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+
+- func: nonzero.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU: nonzero_out_cpu
+    CUDA: nonzero_out_cuda
+    MPS: nonzero_out_mps
+  tags: dynamic_output_shape
+
+- func: nonzero(Tensor self) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU: nonzero_cpu
+    CUDA: nonzero_cuda
+    MPS: nonzero_mps
+  tags: [dynamic_output_shape, core]
+
+- func: nonzero_static.out(Tensor self, *, SymInt size, int fill_value=-1, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU: nonzero_static_out_cpu
+    CUDA: nonzero_static_out_cuda
+
+- func: nonzero_static(Tensor self, *, SymInt size, int fill_value=-1) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU: nonzero_static_cpu
+    CUDA: nonzero_static_cuda
+
+- func: nonzero_numpy(Tensor self) -> Tensor[]
+  variants: method, function
+
+- func: argwhere(Tensor self) -> Tensor
+  variants: method, function
+  tags: dynamic_output_shape
+
+- func: gather.out(Tensor self, int dim, Tensor index, *, bool sparse_grad=False, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU, CUDA: gather_out
+    MPS: gather_out_mps
+
+- func: gather(Tensor self, int dim, Tensor index, *, bool sparse_grad=False) -> Tensor
+  variants: method, function
+  structured_delegate: gather.out
+  tags: core
+
+- func: gather_backward(Tensor grad, Tensor self, int dim, Tensor index, bool sparse_grad) -> Tensor
+  variants: function
+  device_check: NoCheck
+  device_guard: False
+
+- func: gather.dimname_out(Tensor self, Dimname dim, Tensor index, *, bool sparse_grad=False, Tensor(a!) out) -> Tensor(a!)
+
+- func: gather.dimname(Tensor self, Dimname dim, Tensor index, *, bool sparse_grad=False) -> Tensor
+  variants: method, function
+
+- func: _gather_sparse_backward(Tensor self, int dim, Tensor index, Tensor grad) -> Tensor
+
+- func: addcmul.out(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: addcmul_out
+    MPS: addcmul_out_mps
+  tags: pointwise
+
+- func: addcmul(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1) -> Tensor
+  structured_delegate: addcmul.out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  tags: pointwise
+
+- func: addcmul_(Tensor(a!) self, Tensor tensor1, Tensor tensor2, *, Scalar value=1) -> Tensor(a!)
+  structured_delegate: addcmul.out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  tags: pointwise
+
+- func: addcdiv.out(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: addcdiv_out
+    MPS: addcdiv_out_mps
+  tags: pointwise
+
+- func: addcdiv(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1) -> Tensor
+  structured_delegate: addcdiv.out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  tags: pointwise
+
+- func: addcdiv_(Tensor(a!) self, Tensor tensor1, Tensor tensor2, *, Scalar value=1) -> Tensor(a!)
+  structured_delegate: addcdiv.out
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  tags: pointwise
+
+- func: cross_entropy_loss(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100, float label_smoothing=0.0) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: cross_entropy_loss_symint
+
+- func: triangular_solve.X(Tensor self, Tensor A, bool upper=True, bool transpose=False, bool unitriangular=False, *, Tensor(a!) X, Tensor(b!) M) -> (Tensor(a!) solution, Tensor(b!) cloned_coefficient)
+  structured: True
+  dispatch:
+    CPU, CUDA: triangular_solve_out
+    MPS: triangular_solve_mps_out
+    SparseCsrCPU: triangular_solve_out_sparse_csr_cpu
+    SparseCsrCUDA: triangular_solve_out_sparse_csr_cuda
+
+- func: triangular_solve(Tensor self, Tensor A, bool upper=True, bool transpose=False, bool unitriangular=False) -> (Tensor solution, Tensor cloned_coefficient)
+  structured_delegate: triangular_solve.X
+  variants: method, function
+
+- func: _linalg_check_errors(Tensor info, str api_name, *, bool is_matrix) -> ()
+  dispatch:
+    CompositeExplicitAutograd: _linalg_check_errors
+
+- func: linalg_solve_triangular.out(Tensor self, Tensor B, *, bool upper, bool left=True, bool unitriangular=False, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  dispatch:
+    CPU, CUDA: linalg_solve_triangular_out
+    MPS: linalg_solve_triangular_mps_out
+
+- func: linalg_solve_triangular(Tensor self, Tensor B, *, bool upper, bool left=True, bool unitriangular=False) -> Tensor
+  python_module: linalg
+  variants: function
+  dispatch:
+    CPU, CUDA: linalg_solve_triangular
+    MPS: linalg_solve_triangular_mps
+
+- func: linalg_vander(Tensor x, *, SymInt? N=None) -> Tensor
+  python_module: linalg
+  dispatch:
+    CompositeImplicitAutograd: linalg_vander_symint
+
+- func: svd.U(Tensor self, bool some=True, bool compute_uv=True, *, Tensor(a!) U, Tensor(b!) S, Tensor(c!) V) -> (Tensor(a!) U, Tensor(b!) S, Tensor(c!) V)
+
+- func: svd(Tensor self, bool some=True, bool compute_uv=True) -> (Tensor U, Tensor S, Tensor V)
+  variants: method, function
+
+# swapaxes, alias for transpose
+- func: swapaxes(Tensor(a) self, int axis0, int axis1) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+
+- func: swapaxes_(Tensor(a!) self, int axis0, int axis1) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  tags: inplace_view
+
+# swapdims, alias for transpose
+- func: swapdims(Tensor(a) self, int dim0, int dim1) -> Tensor(a)
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+
+- func: swapdims_(Tensor(a!) self, int dim0, int dim1) -> Tensor(a!)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  tags: inplace_view
+
+- func: cholesky.out(Tensor self, bool upper=False, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: cholesky_out
+
+- func: cholesky(Tensor self, bool upper=False) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU, CUDA, MPS: cholesky
+
+- func: cholesky_solve.out(Tensor self, Tensor input2, bool upper=False, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: cholesky_solve_out
+
+- func: cholesky_solve(Tensor self, Tensor input2, bool upper=False) -> Tensor
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: cholesky_solve
+
+- func: _cholesky_solve_helper(Tensor self, Tensor A, bool upper) -> Tensor
+  variants: function
+  dispatch:
+    CPU: _cholesky_solve_helper_cpu
+    CUDA: _cholesky_solve_helper_cuda
+  autogen: _cholesky_solve_helper.out
+
+- func: cholesky_inverse(Tensor self, bool upper=False) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU, CUDA: cholesky_inverse
+
+- func: cholesky_inverse.out(Tensor self, bool upper=False, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: cholesky_inverse_out
+
+- func: qr.Q(Tensor self, bool some=True, *, Tensor(a!) Q, Tensor(b!) R) -> (Tensor(a!) Q, Tensor(b!) R)
+
+- func: qr(Tensor self, bool some=True) -> (Tensor Q, Tensor R)
+  variants: method, function
+
+- func: geqrf.a(Tensor self, *, Tensor(a!) a, Tensor(b!) tau) -> (Tensor(a!) a, Tensor(b!) tau)
+  dispatch:
+    CPU, CUDA: geqrf_out
+
+- func: geqrf(Tensor self) -> (Tensor a, Tensor tau)
+  variants: method, function
+  dispatch:
+    CPU, CUDA: geqrf
+
+# orgqr, alias for linalg_householder_product
+- func: orgqr(Tensor self, Tensor input2) -> Tensor
+  variants: method, function
+
+- func: orgqr.out(Tensor self, Tensor input2, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: ormqr.out(Tensor self, Tensor input2, Tensor input3, bool left=True, bool transpose=False, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: ormqr_out
+
+- func: ormqr(Tensor self, Tensor input2, Tensor input3, bool left=True, bool transpose=False) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU, CUDA: ormqr
+
+- func: _lu_with_info(Tensor self, bool pivot=True, bool check_errors=True) -> (Tensor LU, Tensor pivots, Tensor info)
+  variants: function
+
+- func: lu_solve.out(Tensor self, Tensor LU_data, Tensor LU_pivots, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: lu_solve(Tensor self, Tensor LU_data, Tensor LU_pivots) -> Tensor
+  variants: method, function
+
+# lu_unpack
+- func: lu_unpack(Tensor LU_data, Tensor LU_pivots, bool unpack_data=True, bool unpack_pivots=True) -> (Tensor P, Tensor L, Tensor U)
+  structured_delegate: lu_unpack.out
+  variants: function
+
+- func: lu_unpack.out(Tensor LU_data, Tensor LU_pivots, bool unpack_data=True, bool unpack_pivots=True, *, Tensor(a!) P, Tensor(b!) L, Tensor(c!) U) -> (Tensor(a!) P, Tensor(b!) L, Tensor(c!) U)
+  variants: function
+  structured: True
+  dispatch:
+    CPU, CUDA, MPS: lu_unpack_out
+
+# TODO: remove dispatch section when porting TH CUDA to ATen
+- func: multinomial.out(Tensor self, SymInt num_samples, bool replacement=False, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+  dispatch:
+    CPU, CUDA: multinomial_out
+    MPS: multinomial_out_mps
+
+- func: multinomial(Tensor self, SymInt num_samples, bool replacement=False, *, Generator? generator=None) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU, CUDA: multinomial
+    MPS: multinomial_mps
+  tags: nondeterministic_seeded
+
+- func: lgamma.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: lgamma_out
+    MPS: lgamma_out_mps
+  tags: pointwise
+
+- func: lgamma_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: lgamma.out
+  variants: method
+  tags: pointwise
+
+- func: lgamma(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: lgamma.out
+  variants: method, function
+  tags: pointwise
+
+- func: digamma.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: digamma_out
+    MPS: digamma_out_mps
+  tags: pointwise
+
+- func: digamma(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: digamma.out
+  variants: method, function
+  tags: pointwise
+
+- func: polygamma.out(int n, Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: polygamma_out
+    MPS: polygamma_out_mps
+  tags: pointwise
+
+- func: polygamma(int n, Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: polygamma.out
+  variants: method, function
+  tags: pointwise
+
+- func: polygamma_(Tensor(a!) self, int n) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: polygamma_
+  tags: pointwise
+
+- func: erfinv(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: erfinv.out
+  variants: method, function
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: erfinv_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: erfinv_sparse_csr
+  tags: pointwise
+
+- func: erfinv_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: erfinv.out
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: erfinv_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: erfinv_sparse_csr_
+  tags: pointwise
+
+- func: erfinv.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: erfinv_out
+    SparseCPU, SparseCUDA, SparseMPS: erfinv_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: erfinv_sparse_csr_out
+  tags: pointwise
+
+- func: i0(Tensor self) -> Tensor
+  structured_delegate: i0.out
+  variants: function, method
+  tags: pointwise
+
+- func: i0_(Tensor(a!) self) -> Tensor(a!)
+  structured_delegate: i0.out
+  variants: function, method
+  tags: pointwise
+
+- func: i0.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: i0_out
+  tags: pointwise
+
+- func: sign(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: sign.out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sign_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sign_sparse_csr
+  tags: [core, pointwise]
+
+- func: sign_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: sign.out
+  variants: method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: sign_sparse_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sign_sparse_csr_
+  tags: pointwise
+
+- func: sign.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: sign_out
+    MPS: sign_out_mps
+    SparseCPU, SparseCUDA, SparseMPS: sign_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: sign_sparse_csr_out
+  tags: pointwise
+
+- func: signbit(Tensor self) -> Tensor
+  variants: function, method
+  structured_delegate: signbit.out
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: signbit_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: signbit_sparse_csr
+  tags: pointwise
+
+- func: signbit.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU: signbit_out
+    CUDA: signbit_out
+    MPS: signbit_out_mps
+    SparseCPU, SparseCUDA, SparseMPS: signbit_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: signbit_sparse_csr_out
+  tags: pointwise
+
+- func: dist(Tensor self, Tensor other, Scalar p=2) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: dist
+  autogen: dist.out
+
+- func: atan2.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: atan2_out
+    MPS: atan2_out_mps
+  tags: [core, pointwise]
+
+- func: atan2_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: atan2.out
+  variants: method
+  tags: pointwise
+
+- func: atan2(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: atan2.out
+  variants: method, function
+  tags: [core, pointwise]
+# arctan2, alias of atan2
+
+- func: arctan2(Tensor self, Tensor other) -> Tensor
+  variants: method, function
+
+- func: arctan2.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+
+- func: arctan2_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  variants: method
+
+- func: lerp.Scalar_out(Tensor self, Tensor end, Scalar weight, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: lerp_Scalar
+  tags: pointwise
+
+- func: lerp.Tensor_out(Tensor self, Tensor end, Tensor weight, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: lerp_Tensor
+    MPS: lerp_Tensor_mps
+  tags: pointwise
+
+- func: lerp.Scalar(Tensor self, Tensor end, Scalar weight) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  structured_delegate: lerp.Scalar_out
+  tags: pointwise
+
+- func: lerp.Tensor(Tensor self, Tensor end, Tensor weight) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  structured_delegate: lerp.Tensor_out
+  tags: pointwise
+
+- func: histc.out(Tensor self, int bins=100, Scalar min=0, Scalar max=0, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, MPS: histogram_histc_out
+    CUDA: _histc_out_cuda
+
+- func: histc(Tensor self, int bins=100, Scalar min=0, Scalar max=0) -> Tensor
+  variants: method, function
+  dispatch:
+    CPU, MPS: histogram_histc
+    CUDA: _histc_cuda
+
+- func: histogram.bins_tensor_out(Tensor self, Tensor bins, *, Tensor? weight=None, bool density=False, Tensor(a!) hist, Tensor(b!) bin_edges) -> (Tensor(a!) hist, Tensor(b!) bin_edges)
+  dispatch:
+    CPU, MPS: histogram_out
+
+- func: histogram.bins_tensor(Tensor self, Tensor bins, *, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor bin_edges)
+  variants: method, function
+  dispatch:
+    CPU, MPS: histogram
+
+- func: histogram.bin_ct_out(Tensor self, int bins=100, *, float[]? range=None, Tensor? weight=None, bool density=False, Tensor(a!) hist, Tensor(b!) bin_edges) -> (Tensor(a!) hist, Tensor(b!) bin_edges)
+  dispatch:
+    CPU, MPS: histogram_out
+
+- func: histogram.bin_ct(Tensor self, int bins=100, *, float[]? range=None, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor bin_edges)
+  variants: method, function
+  dispatch:
+    CPU, MPS: histogram
+
+- func: _histogramdd_bin_edges(Tensor self, int[] bins, *, float[]? range=None, Tensor? weight=None, bool density=False) -> Tensor[]
+  dispatch:
+    CPU, MPS: histogramdd_bin_edges
+  autogen: _histogramdd_bin_edges.out
+
+- func: _histogramdd_from_bin_cts(Tensor self, int[] bins, *, float[]? range=None, Tensor? weight=None, bool density=False) -> Tensor
+  dispatch:
+    CPU, MPS: _histogramdd
+  autogen: _histogramdd_from_bin_cts.out
+
+- func: _histogramdd_from_bin_tensors(Tensor self, Tensor[] bins, *, Tensor? weight=None, bool density=False) -> Tensor
+  dispatch:
+    CPU, MPS: _histogramdd
+  autogen: _histogramdd_from_bin_tensors.out
+
+- func: histogramdd(Tensor self, int[] bins, float[]? range=None, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor[] bin_edges)
+
+- func: histogramdd.int_bins(Tensor self, int bins, float[]? range=None, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor[] bin_edges)
+
+- func: histogramdd.TensorList_bins(Tensor self, Tensor[] bins, float[]? range=None, Tensor? weight=None, bool density=False) -> (Tensor hist, Tensor[] bin_edges)
+
+- func: fmod.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CompositeExplicitAutograd: fmod_out
+  tags: pointwise
+
+- func: fmod.Scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: fmod
+  tags: [core, pointwise]
+
+- func: fmod_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: fmod_
+  tags: pointwise
+
+- func: fmod.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: fmod_out
+  tags: pointwise
+
+- func: fmod.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: fmod.Tensor_out
+  variants: method, function
+  tags: [core, pointwise]
+
+- func: fmod_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: fmod.Tensor_out
+  tags: pointwise
+
+- func: hypot.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: hypot_out
+  tags: pointwise
+
+- func: hypot(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: hypot.out
+  variants: method, function
+  tags: pointwise
+
+- func: hypot_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: hypot.out
+  variants: method
+  tags: pointwise
+
+- func: igamma.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: igamma_out
+  tags: pointwise
+
+- func: igamma(Tensor self, Tensor other) -> Tensor
+  structured_delegate: igamma.out
+  variants: method, function
+  tags: pointwise
+
+- func: igamma_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: igamma.out
+  variants: method
+  tags: pointwise
+
+- func: igammac.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: igammac_out
+  tags: pointwise
+
+- func: igammac(Tensor self, Tensor other) -> Tensor
+  structured_delegate: igammac.out
+  variants: method, function
+  tags: pointwise
+
+- func: igammac_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: igammac.out
+  variants: method
+  tags: pointwise
+
+- func: nextafter.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: nextafter_out
+  tags: pointwise
+
+- func: nextafter(Tensor self, Tensor other) -> Tensor
+  structured_delegate: nextafter.out
+  variants: method, function
+  tags: pointwise
+
+- func: nextafter_(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  structured_delegate: nextafter.out
+  variants: method
+  tags: pointwise
+
+- func: remainder.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: remainder_out
+  tags: pointwise
+
+- func: remainder.Scalar(Tensor self, Scalar other) -> Tensor
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: remainder
+  tags: [core, pointwise]
+
+- func: remainder_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  variants: method
+  dispatch:
+    CompositeExplicitAutograd: remainder_
+  tags: pointwise
+
+- func: remainder.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS, MTIA: remainder_out
+  tags: pointwise
+
+- func: remainder.Tensor(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: remainder.Tensor_out
+  variants: method, function
+  tags: [core, pointwise]
+
+- func: remainder_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: remainder.Tensor_out
+  variants: method
+  tags: pointwise
+
+- func: remainder.Scalar_Tensor(Scalar self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS: remainder
+  autogen: remainder.Scalar_Tensor_out
+  tags: pointwise
+
+- func: min(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CPU, CUDA: min
+    MPS: min_mps
+    QuantizedCPU: min_quantized_cpu
+  tags: [reduction]
+
+- func: min.unary_out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: min_unary_out
+    QuantizedCPU: min_quantized_unary_out
+  tags: [reduction]
+
+- func: fmin(Tensor self, Tensor other) -> Tensor
+  structured_delegate: fmin.out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  tags: pointwise
+
+- func: fmin.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MPS: fmin_out
+  tags: pointwise
+
+- func: max(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CPU, CUDA: max
+    MPS: max_mps
+    QuantizedCPU: max_quantized_cpu
+  tags: [reduction]
+
+- func: fmax(Tensor self, Tensor other) -> Tensor
+  structured_delegate: fmax.out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  tags: pointwise
+
+- func: fmax.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MPS: fmax_out
+  tags: pointwise
+
+- func: maximum(Tensor self, Tensor other) -> Tensor
+  structured_delegate: maximum.out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  tags: [core, pointwise]
+
+- func: maximum.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: maximum_out
+    MPS: maximum_out_mps
+  tags: pointwise
+
+# binary max, alias of maximum
+# NOTE: max is not an alias for maximum, since there is also unary max
+- func: max.other(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  tags: pointwise
+
+- func: max.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: pointwise
+
+- func: max.unary_out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA: max_unary_out
+    QuantizedCPU: max_quantized_unary_out
+  tags: [reduction]
+
+- func: minimum(Tensor self, Tensor other) -> Tensor
+  structured_delegate: minimum.out
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  tags: [core, pointwise]
+
+- func: minimum.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CPU, CUDA, MTIA: minimum_out
+    MPS: minimum_out_mps
+  tags: pointwise
+
+# binary min, alias for minimum
+# NOTE: min is not an alias for minimum, since there is also unary min
+- func: min.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: pointwise
+
+- func: min.other(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  tags: pointwise
+
+- func: quantile(Tensor self, Tensor q, int? dim=None, bool keepdim=False, *, str interpolation='linear') -> Tensor
+  variants: method, function
+
+- func: quantile.out(Tensor self, Tensor q, int? dim=None, bool keepdim=False, *, str interpolation='linear', Tensor(a!) out) -> Tensor(a!)
+
+- func: quantile.scalar(Tensor self, float q, int? dim=None, bool keepdim=False, *, str interpolation='linear') -> Tensor
+  variants: method, function
+
+- func: quantile.scalar_out(Tensor self, float q, int? dim=None, bool keepdim=False, *, str interpolation='linear', Tensor(a!) out) -> Tensor(a!)
+
+- func: nanquantile(Tensor self, Tensor q, int? dim=None, bool keepdim=False, *, str interpolation='linear') -> Tensor
+  variants: method, function
+
+- func: nanquantile.out(Tensor self, Tensor q, int? dim=None, bool keepdim=False, *, str interpolation='linear', Tensor(a!) out) -> Tensor(a!)
+
+- func: nanquantile.scalar(Tensor self, float q, int? dim=None, bool keepdim=False, *, str interpolation='linear') -> Tensor
+  variants: method, function
+
+- func: nanquantile.scalar_out(Tensor self, float q, int? dim=None, bool keepdim=False, *, str interpolation='linear', Tensor(a!) out) -> Tensor(a!)
+
+- func: sort.values(Tensor self, int dim=-1, bool descending=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  device_check: NoCheck   # TensorIterator
+  dispatch:
+    CompositeExplicitAutograd: sort_out
+
+- func: sort.values_stable(Tensor self, *, bool? stable, int dim=-1, bool descending=False, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  structured: True
+  dispatch:
+    CPU, CUDA: sort_stable_out
+    MPS: sort_stable_out_mps
+
+- func: sort(Tensor self, int dim=-1, bool descending=False) -> (Tensor values, Tensor indices)
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: sort
+  tags: core
+
+- func: sort.stable(Tensor self, *, bool? stable, int dim=-1, bool descending=False) -> (Tensor values, Tensor indices)
+  structured_delegate: sort.values_stable
+  variants: method, function
+  dispatch:
+    QuantizedCPU: sort_quantized_cpu_stable
+
+- func: sort.dimname_values(Tensor self, Dimname dim, bool descending=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+
+- func: sort.dimname_values_stable(Tensor self, *, bool? stable, Dimname dim, bool descending=False, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+
+- func: sort.dimname(Tensor self, Dimname dim, bool descending=False) -> (Tensor values, Tensor indices)
+  variants: method, function
+
+- func: sort.dimname_stable(Tensor self, *, bool? stable, Dimname dim, bool descending=False) -> (Tensor values, Tensor indices)
+  variants: method, function
+
+- func: msort.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: msort(Tensor self) -> Tensor
+  variants: method, function
+
+- func: argsort(Tensor self, int dim=-1, bool descending=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+
+- func: argsort.stable(Tensor self, *, bool stable, int dim=-1, bool descending=False) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+
+- func: argsort.stable_out(Tensor self, *, bool stable, int dim=-1, bool descending=False, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: function
+
+- func: argsort.dimname(Tensor self, Dimname dim, bool descending=False) -> Tensor
+  variants: method, function
+
+- func: topk.values(Tensor self, SymInt k, int dim=-1, bool largest=True, bool sorted=True, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
+  structured: True
+  dispatch:
+    CPU: topk_out_cpu
+    CUDA: topk_out_cuda
+    MPS: topk_out_mps
+
+- func: topk(Tensor self, SymInt k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
+  variants: method, function
+  structured_delegate: topk.values
+  dispatch:
+    QuantizedCPU: topk_quantized_cpu
+  tags: core
+
+- func: all(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: all.all_out
+  variants: method, function
+  tags: reduction
+
+- func: all.all_out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  structured: True
+  dispatch:
+    CPU, CUDA: all_all_out
+    MTIA: all_all_out_mtia
+    MPS: all_all_out_mps
+  tags: reduction
+
+- func: any(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: any.all_out
+  variants: method, function
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: any_sparse
+  tags: [core, reduction]
+
+- func: any.all_out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  structured: True
+  dispatch:
+    CPU, CUDA: any_all_out
+    MPS: any_all_out_mps
+  tags: reduction
+
+- func: renorm.out(Tensor self, Scalar p, int dim, Scalar maxnorm, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  dispatch:
+    CPU, CUDA: renorm_out
+    MPS: renorm_out_mps
+
+- func: renorm(Tensor self, Scalar p, int dim, Scalar maxnorm) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  variants: method, function
+  structured_delegate: renorm.out
+
+- func: renorm_(Tensor(a!) self, Scalar p, int dim, Scalar maxnorm) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  variants: method
+  structured_delegate: renorm.out
+
+- func: unfold(Tensor(a) self, int dimension, int size, int step) -> Tensor(a)
+  variants: method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CPU, CUDA, Meta, MPS, MTIA: unfold
+    QuantizedCPU, QuantizedCUDA: unfold
+
+- func: unfold_backward(Tensor grad_in, SymInt[] input_sizes, int dim, int size, int step) -> Tensor
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS: unfold_backward
+  autogen: unfold_backward.out
+
+- func: equal(Tensor self, Tensor other) -> bool
+  tags: [data_dependent_output, pointwise]
+  variants: method, function
+  dispatch:
+    CPU: cpu_equal
+    CUDA: cuda_equal
+    MPS: mps_equal
+    QuantizedCPU: equal_quantized_cpu
+
+- func: pow.Tensor_Tensor_out(Tensor self, Tensor exponent, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: pow_Tensor_Tensor_out
+    MPS: pow_tensor_tensor_out_mps
+  tags: pointwise
+
+- func: pow.Tensor_Tensor(Tensor self, Tensor exponent) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: pow.Tensor_Tensor_out
+  variants: method, function
+  tags: [core, pointwise]
+
+- func: pow.Scalar_out(Scalar self, Tensor exponent, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  dispatch:
+    CPU, CUDA: pow_Scalar_out
+    MPS: pow_Scalar_out_mps
+  tags: pointwise
+
+- func: pow.Scalar(Scalar self, Tensor exponent) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: pow.Scalar_out
+  tags: [core, pointwise]
+
+- func: pow.Tensor_Scalar_out(Tensor self, Scalar exponent, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: pow_Tensor_Scalar_out
+    SparseCPU, SparseCUDA, SparseMPS: pow_out_sparse_scalar
+    MPS: pow_tensor_scalar_out_mps
+  tags: pointwise
+
+- func: pow.Tensor_Scalar(Tensor self, Scalar exponent) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: pow.Tensor_Scalar_out
+  variants: function, method
+  dispatch:
+    SparseCPU, SparseCUDA, SparseMPS: pow_sparse_scalar
+  tags: [core, pointwise]
+
+- func: pow_.Scalar(Tensor(a!) self, Scalar exponent) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: pow.Tensor_Scalar_out
+  variants: method
+  tags: pointwise
+
+- func: pow_.Tensor(Tensor(a!) self, Tensor exponent) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: pow.Tensor_Tensor_out
+  variants: method
+  tags: pointwise
+
+- func: float_power.Tensor_Tensor_out(Tensor self, Tensor exponent, *, Tensor(a!) out) -> Tensor(a!)
+  tags: pointwise
+
+- func: float_power.Tensor_Tensor(Tensor self, Tensor exponent) -> Tensor
+  variants: function, method
+  tags: pointwise
+
+- func: float_power.Scalar_out(Scalar self, Tensor exponent, *, Tensor(a!) out) -> Tensor(a!)
+  tags: pointwise
+
+- func: float_power.Scalar(Scalar self, Tensor exponent) -> Tensor
+  tags: pointwise
+
+- func: float_power.Tensor_Scalar_out(Tensor self, Scalar exponent, *, Tensor(a!) out) -> Tensor(a!)
+  tags: pointwise
+
+- func: float_power.Tensor_Scalar(Tensor self, Scalar exponent) -> Tensor
+  variants: function, method
+  tags: pointwise
+
+- func: float_power_.Scalar(Tensor(a!) self, Scalar exponent) -> Tensor(a!)
+  variants: method
+  tags: pointwise
+
+- func: float_power_.Tensor(Tensor(a!) self, Tensor exponent) -> Tensor(a!)
+  variants: method
+  tags: pointwise
+
+- func: normal_(Tensor(a!) self, float mean=0, float std=1, *, Generator? generator=None) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  tags: nondeterministic_seeded
+  variants: method
+  dispatch:
+    CPU, CUDA: normal_
+    MPS: normal_mps_
+    Meta: normal_meta_
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: normal_sparse_csr_
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: normal_nested_
+  autogen: normal.out
+
+# Only used by the functionalization pass.
+# Normally, the codegen would be able to generate a normal() NativeFunction,
+# but we can't due to overload ambiguity with normal.Tensor_float.
+- func: normal_functional(Tensor self, float mean=0, float std=1, *, Generator? generator=None) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  tags: nondeterministic_seeded
+  dispatch:
+    CompositeExplicitAutograd: normal_functional
+
+- func: normal.Tensor_float_out(Tensor mean, float std=1, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!)
+  tags: nondeterministic_seeded
+  dispatch:
+    CPU, CUDA: normal_out
+    MPS: normal_mps_out
+    Meta: normal_out_meta
+
+- func: normal.Tensor_float(Tensor mean, float std=1, *, Generator? generator=None) -> Tensor
+  dispatch:
+    CPU, CUDA: normal
+    MPS: normal_mps
+    Meta: normal_meta
+  tags: nondeterministic_seeded
+
+- func: normal.float_Tensor_out(float mean, Tensor std, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: normal_out
+    Meta: normal_out_meta
+    MPS: normal_mps_out
+  tags: nondeterministic_seeded
+
+- func: normal.float_Tensor(float mean, Tensor std, *, Generator? generator=None) -> Tensor
+  dispatch:
+    CPU, CUDA: normal
+    MPS: normal_mps
+    Meta: normal_meta
+  tags: nondeterministic_seeded
+
+- func: normal.Tensor_Tensor_out(Tensor mean, Tensor std, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: normal_out
+    Meta: normal_out_meta
+    MPS: normal_mps_out
+  tags: nondeterministic_seeded
+
+- func: normal.Tensor_Tensor(Tensor mean, Tensor std, *, Generator? generator=None) -> Tensor
+  dispatch:
+    CPU, CUDA: normal
+    MPS: normal_mps
+    Meta: normal_meta
+  tags: nondeterministic_seeded
+
+- func: normal.float_float(float mean, float std, SymInt[] size, *, Generator? generator=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: normal
+  tags: nondeterministic_seeded
+
+- func: normal.float_float_out(float mean, float std, SymInt[] size, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: normal_out
+  tags: nondeterministic_seeded
+
+- func: alias(Tensor(a) self) -> Tensor(a)
+  variants: method, function
+  dispatch:
+    CompositeExplicitAutograd: alias
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: alias_nested
+  tags: core
+
+- func: _amp_foreach_non_finite_check_and_unscale_(Tensor(a!)[] self, Tensor(b!) found_inf, Tensor inv_scale) -> ()
+  variants: function
+  dispatch:
+    CUDA: _amp_foreach_non_finite_check_and_unscale_cuda_
+    CPU: _amp_foreach_non_finite_check_and_unscale_cpu_
+    MPS: _amp_foreach_non_finite_check_and_unscale_mps_
+  autogen: _amp_foreach_non_finite_check_and_unscale, _amp_foreach_non_finite_check_and_unscale.out
+
+- func: _amp_update_scale_(Tensor(a!) self, Tensor(b!) growth_tracker, Tensor found_inf, float scale_growth_factor, float scale_backoff_factor, int growth_interval) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CUDA: _amp_update_scale_cuda_
+    CPU: _amp_update_scale_cpu_
+    MPS: _amp_update_scale_mps_
+  autogen: _amp_update_scale, _amp_update_scale.out
+
+    #- func: _cat(Tensor[] tensors, int dim=0) -> Tensor
+    #dispatch:
+    #CPU: _cat_cpu
+    #CUDA: cat_cuda
+    #MPS: cat_mps
+    #QuantizedCPU: cat_quantized_cpu
+
+    #- func: _cat.out(Tensor[] tensors, int dim=0, *, Tensor(a!) out) -> Tensor(a!)
+    #dispatch:
+    #CPU: _cat_out_cpu
+  #CUDA: cat_out_cuda
+  #QuantizedCPU: cat_out_quantized_cpu
+
+- func: _foreach_add.Scalar(Tensor[] self, Scalar scalar) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_add_scalar_kernel_slow
+    CUDA: foreach_tensor_add_scalar_kernel_cuda
+
+- func: _foreach_add_.Scalar(Tensor(a!)[] self, Scalar scalar) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_add_scalar_kernel_slow_
+    CUDA: foreach_tensor_add_scalar_kernel_cuda_
+    MTIA: foreach_tensor_add_scalar_kernel_mtia_
+  autogen: _foreach_add.Scalar_out
+
+- func: _foreach_add.List(Tensor[] self, Tensor[] other, *, Scalar alpha=1) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_add_list_kernel_slow
+    CUDA: foreach_tensor_add_list_kernel_cuda
+    MTIA: foreach_tensor_add_list_kernel_mtia
+
+- func: _foreach_add_.List(Tensor(a!)[] self, Tensor[] other, *, Scalar alpha=1) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_add_list_kernel_slow_
+    CUDA: foreach_tensor_add_list_kernel_cuda_
+    MTIA: foreach_tensor_add_list_kernel_mtia_
+  autogen: _foreach_add.List_out
+
+- func: _foreach_add.ScalarList(Tensor[] self, Scalar[] scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_add_scalarlist_kernel_slow
+    CUDA: foreach_tensor_add_scalarlist_kernel_cuda
+
+- func: _foreach_add_.ScalarList(Tensor(a!)[] self, Scalar[] scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_add_scalarlist_kernel_slow_
+    CUDA: foreach_tensor_add_scalarlist_kernel_cuda_
+  autogen: _foreach_add.ScalarList_out
+
+- func: _foreach_add.Tensor(Tensor[] self, Tensor other, *, Scalar alpha=1) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_add_tensor_kernel_slow
+    CUDA: foreach_tensor_add_tensor_kernel_cuda
+
+- func: _foreach_add_.Tensor(Tensor(a!)[] self, Tensor other, *, Scalar alpha=1) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_add_tensor_kernel_slow_
+    CUDA: foreach_tensor_add_tensor_kernel_cuda_
+    MTIA: foreach_tensor_add_tensor_kernel_mtia_
+  autogen: _foreach_add.Tensor_out
+
+- func: _foreach_sub.Scalar(Tensor[] self, Scalar scalar) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sub_scalar_kernel_slow
+    CUDA: foreach_tensor_sub_scalar_kernel_cuda
+
+- func: _foreach_sub_.Scalar(Tensor(a!)[] self, Scalar scalar) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sub_scalar_kernel_slow_
+    CUDA: foreach_tensor_sub_scalar_kernel_cuda_
+  autogen: _foreach_sub.Scalar_out
+
+- func: _foreach_sub.List(Tensor[] self, Tensor[] other, *, Scalar alpha=1) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sub_list_kernel_slow
+    CUDA: foreach_tensor_sub_list_kernel_cuda
+
+- func: _foreach_sub_.List(Tensor(a!)[] self, Tensor[] other, *, Scalar alpha=1) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sub_list_kernel_slow_
+    CUDA: foreach_tensor_sub_list_kernel_cuda_
+  autogen: _foreach_sub.List_out
+
+- func: _foreach_sub.ScalarList(Tensor[] self, Scalar[] scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sub_scalarlist_kernel_slow
+    CUDA: foreach_tensor_sub_scalarlist_kernel_cuda
+
+- func: _foreach_sub_.ScalarList(Tensor(a!)[] self, Scalar[] scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sub_scalarlist_kernel_slow_
+    CUDA: foreach_tensor_sub_scalarlist_kernel_cuda_
+  autogen: _foreach_sub.ScalarList_out
+
+- func: _foreach_mul.Scalar(Tensor[] self, Scalar scalar) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_mul_scalar_kernel_slow
+    CUDA: foreach_tensor_mul_scalar_kernel_cuda
+
+- func: _foreach_mul_.Scalar(Tensor(a!)[] self, Scalar scalar) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_mul_scalar_kernel_slow_
+    CUDA: foreach_tensor_mul_scalar_kernel_cuda_
+    MTIA: foreach_tensor_mul_scalar_kernel_mtia_
+  autogen: _foreach_mul.Scalar_out
+
+- func: _foreach_mul.List(Tensor[] self, Tensor[] other) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_mul_list_kernel_slow
+    CUDA: foreach_tensor_mul_list_kernel_cuda
+    MTIA: foreach_tensor_mul_list_kernel_mtia
+
+- func: _foreach_mul_.List(Tensor(a!)[] self, Tensor[] other) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_mul_list_kernel_slow_
+    CUDA: foreach_tensor_mul_list_kernel_cuda_
+    MTIA: foreach_tensor_mul_list_kernel_mtia_
+  autogen: _foreach_mul.List_out
+
+- func: _foreach_mul.ScalarList(Tensor[] self, Scalar[] scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_mul_scalarlist_kernel_slow
+    CUDA: foreach_tensor_mul_scalarlist_kernel_cuda
+
+- func: _foreach_mul_.ScalarList(Tensor(a!)[] self, Scalar[] scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_mul_scalarlist_kernel_slow_
+    CUDA: foreach_tensor_mul_scalarlist_kernel_cuda_
+  autogen: _foreach_mul.ScalarList_out
+
+- func: _foreach_mul.Tensor(Tensor[] self, Tensor other) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_mul_tensor_kernel_slow
+    CUDA: foreach_tensor_mul_tensor_kernel_cuda
+    MTIA: foreach_tensor_mul_tensor_kernel_mtia
+
+- func: _foreach_mul_.Tensor(Tensor(a!)[] self, Tensor other) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_mul_tensor_kernel_slow_
+    CUDA: foreach_tensor_mul_tensor_kernel_cuda_
+    MTIA: foreach_tensor_mul_tensor_kernel_mtia_
+  autogen: _foreach_mul.Tensor_out
+
+- func: _foreach_div.Scalar(Tensor[] self, Scalar scalar) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_div_scalar_kernel_slow
+    CUDA: foreach_tensor_div_scalar_kernel_cuda
+
+- func: _foreach_div_.Scalar(Tensor(a!)[] self, Scalar scalar) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_div_scalar_kernel_slow_
+    CUDA: foreach_tensor_div_scalar_kernel_cuda_
+  autogen: _foreach_div.Scalar_out
+
+- func: _foreach_div.List(Tensor[] self, Tensor[] other) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_div_list_kernel_slow
+    CUDA: foreach_tensor_div_list_kernel_cuda
+    MTIA: foreach_tensor_div_list_kernel_mtia
+
+- func: _foreach_div_.List(Tensor(a!)[] self, Tensor[] other) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_div_list_kernel_slow_
+    CUDA: foreach_tensor_div_list_kernel_cuda_
+    MTIA: foreach_tensor_div_list_kernel_mtia_
+  autogen: _foreach_div.List_out
+
+- func: _foreach_div.ScalarList(Tensor[] self, Scalar[] scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_div_scalarlist_kernel_slow
+    CUDA: foreach_tensor_div_scalarlist_kernel_cuda
+
+- func: _foreach_div_.ScalarList(Tensor(a!)[] self, Scalar[] scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_div_scalarlist_kernel_slow_
+    CUDA: foreach_tensor_div_scalarlist_kernel_cuda_
+  autogen: _foreach_div.ScalarList_out
+
+- func: _foreach_div.Tensor(Tensor[] self, Tensor other) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_div_tensor_kernel_slow
+    CUDA: foreach_tensor_div_tensor_kernel_cuda
+    MTIA: foreach_tensor_div_tensor_kernel_mtia
+
+- func: _foreach_div_.Tensor(Tensor(a!)[] self, Tensor other) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_div_tensor_kernel_slow_
+    CUDA: foreach_tensor_div_tensor_kernel_cuda_
+    MTIA: foreach_tensor_div_tensor_kernel_mtia_
+  autogen: _foreach_div.Tensor_out
+
+- func: _foreach_clamp_max.Scalar(Tensor[] self, Scalar scalar) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_scalar_kernel_slow
+    CUDA: foreach_tensor_clamp_max_scalar_kernel_cuda
+
+- func: _foreach_clamp_max_.Scalar(Tensor(a!)[] self, Scalar scalar) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_scalar_kernel_slow_
+    CUDA: foreach_tensor_clamp_max_scalar_kernel_cuda_
+  autogen: _foreach_clamp_max.Scalar_out
+
+- func: _foreach_clamp_max.List(Tensor[] self, Tensor[] other) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_list_kernel_slow
+    CUDA: foreach_tensor_clamp_max_list_kernel_cuda
+
+- func: _foreach_clamp_max_.List(Tensor(a!)[] self, Tensor[] other) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_list_kernel_slow_
+    CUDA: foreach_tensor_clamp_max_list_kernel_cuda_
+  autogen: _foreach_clamp_max.List_out
+
+- func: _foreach_clamp_max.ScalarList(Tensor[] self, Scalar[] scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_scalarlist_kernel_slow
+    CUDA: foreach_tensor_clamp_max_scalarlist_kernel_cuda
+
+- func: _foreach_clamp_max_.ScalarList(Tensor(a!)[] self, Scalar[] scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_scalarlist_kernel_slow_
+    CUDA: foreach_tensor_clamp_max_scalarlist_kernel_cuda_
+  autogen: _foreach_clamp_max.ScalarList_out
+
+- func: _foreach_clamp_min.Scalar(Tensor[] self, Scalar scalar) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_scalar_kernel_slow
+    CUDA: foreach_tensor_clamp_min_scalar_kernel_cuda
+
+- func: _foreach_clamp_min_.Scalar(Tensor(a!)[] self, Scalar scalar) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_scalar_kernel_slow_
+    CUDA: foreach_tensor_clamp_min_scalar_kernel_cuda_
+  autogen: _foreach_clamp_min.Scalar_out
+
+- func: _foreach_clamp_min.List(Tensor[] self, Tensor[] other) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_list_kernel_slow
+    CUDA: foreach_tensor_clamp_min_list_kernel_cuda
+
+- func: _foreach_clamp_min_.List(Tensor(a!)[] self, Tensor[] other) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_list_kernel_slow_
+    CUDA: foreach_tensor_clamp_min_list_kernel_cuda_
+  autogen: _foreach_clamp_min.List_out
+
+- func: _foreach_clamp_min.ScalarList(Tensor[] self, Scalar[] scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_scalarlist_kernel_slow
+    CUDA: foreach_tensor_clamp_min_scalarlist_kernel_cuda
+
+- func: _foreach_clamp_min_.ScalarList(Tensor(a!)[] self, Scalar[] scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_scalarlist_kernel_slow_
+    CUDA: foreach_tensor_clamp_min_scalarlist_kernel_cuda_
+  autogen: _foreach_clamp_min.ScalarList_out
+
+# foreach_minimum/maximum dispatches to clamp_max/min
+- func: _foreach_maximum.Scalar(Tensor[] self, Scalar scalar) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_scalar_kernel_slow
+    CUDA: foreach_tensor_clamp_min_scalar_kernel_cuda
+
+- func: _foreach_maximum_.Scalar(Tensor(a!)[] self, Scalar scalar) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_scalar_kernel_slow_
+    CUDA: foreach_tensor_clamp_min_scalar_kernel_cuda_
+    MTIA: foreach_tensor_maximum_scalar_kernel_mtia_
+  autogen: _foreach_maximum.Scalar_out
+
+# foreach_minimum/maximum dispatches to clamp_max/min
+- func: _foreach_maximum.List(Tensor[] self, Tensor[] other) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_list_kernel_slow
+    CUDA: foreach_tensor_clamp_min_list_kernel_cuda
+
+- func: _foreach_maximum_.List(Tensor(a!)[] self, Tensor[] other) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_list_kernel_slow_
+    CUDA: foreach_tensor_clamp_min_list_kernel_cuda_
+  autogen: _foreach_maximum.List_out
+
+# foreach_minimum/maximum dispatches to clamp_max/min
+- func: _foreach_maximum.ScalarList(Tensor[] self, Scalar[] scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_scalarlist_kernel_slow
+    CUDA: foreach_tensor_clamp_min_scalarlist_kernel_cuda
+
+- func: _foreach_maximum_.ScalarList(Tensor(a!)[] self, Scalar[] scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_min_scalarlist_kernel_slow_
+    CUDA: foreach_tensor_clamp_min_scalarlist_kernel_cuda_
+  autogen: _foreach_maximum.ScalarList_out
+
+- func: _foreach_minimum.Scalar(Tensor[] self, Scalar scalar) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_scalar_kernel_slow
+    CUDA: foreach_tensor_clamp_max_scalar_kernel_cuda
+
+- func: _foreach_minimum_.Scalar(Tensor(a!)[] self, Scalar scalar) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_scalar_kernel_slow_
+    CUDA: foreach_tensor_clamp_max_scalar_kernel_cuda_
+  autogen: _foreach_minimum.Scalar_out
+
+- func: _foreach_minimum.List(Tensor[] self, Tensor[] other) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_list_kernel_slow
+    CUDA: foreach_tensor_clamp_max_list_kernel_cuda
+
+- func: _foreach_minimum_.List(Tensor(a!)[] self, Tensor[] other) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_list_kernel_slow_
+    CUDA: foreach_tensor_clamp_max_list_kernel_cuda_
+  autogen: _foreach_minimum.List_out
+
+- func: _foreach_minimum.ScalarList(Tensor[] self, Scalar[] scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_scalarlist_kernel_slow
+    CUDA: foreach_tensor_clamp_max_scalarlist_kernel_cuda
+
+- func: _foreach_minimum_.ScalarList(Tensor(a!)[] self, Scalar[] scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_clamp_max_scalarlist_kernel_slow_
+    CUDA: foreach_tensor_clamp_max_scalarlist_kernel_cuda_
+  autogen: _foreach_minimum.ScalarList_out
+
+- func: _foreach_addcdiv.Scalar(Tensor[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar value=1) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcdiv_scalar_slow
+    CUDA: foreach_tensor_addcdiv_scalar_cuda
+
+- func: _foreach_addcdiv.ScalarList(Tensor[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar[] scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcdiv_scalarlist_slow
+    CUDA: foreach_tensor_addcdiv_scalarlist_cuda
+
+- func: _foreach_addcdiv.Tensor(Tensor[] self, Tensor[] tensor1, Tensor[] tensor2, Tensor scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcdiv_tensor_slow
+    CUDA: foreach_tensor_addcdiv_tensor_cuda
+
+- func: _foreach_addcdiv_.Scalar(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar value=1) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcdiv_scalar_slow_
+    CUDA: foreach_tensor_addcdiv_scalar_cuda_
+  autogen: _foreach_addcdiv.Scalar_out
+
+- func: _foreach_addcdiv_.ScalarList(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar[] scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcdiv_scalarlist_slow_
+    CUDA: foreach_tensor_addcdiv_scalarlist_cuda_
+  autogen: _foreach_addcdiv.ScalarList_out
+
+- func: _foreach_addcdiv_.Tensor(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, Tensor scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcdiv_tensor_slow_
+    CUDA: foreach_tensor_addcdiv_tensor_cuda_
+  autogen: _foreach_addcdiv.Tensor_out
+
+- func: _foreach_addcmul.Scalar(Tensor[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar value=1) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcmul_scalar_slow
+    CUDA: foreach_tensor_addcmul_scalar_cuda
+    MTIA: foreach_tensor_addcmul_scalar_mtia
+
+- func: _foreach_addcmul.ScalarList(Tensor[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar[] scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcmul_scalarlist_slow
+    CUDA: foreach_tensor_addcmul_scalarlist_cuda
+
+- func: _foreach_addcmul.Tensor(Tensor[] self, Tensor[] tensor1, Tensor[] tensor2, Tensor scalars) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcmul_tensor_slow
+    CUDA: foreach_tensor_addcmul_tensor_cuda
+
+- func: _foreach_addcmul_.Scalar(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar value=1) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcmul_scalar_slow_
+    CUDA: foreach_tensor_addcmul_scalar_cuda_
+    MTIA: foreach_tensor_addcmul_scalar_mtia_
+  autogen: _foreach_addcmul.Scalar_out
+
+- func: _foreach_addcmul_.ScalarList(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar[] scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcmul_scalarlist_slow_
+    CUDA: foreach_tensor_addcmul_scalarlist_cuda_
+  autogen: _foreach_addcmul.ScalarList_out
+
+- func: _foreach_addcmul_.Tensor(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, Tensor scalars) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_addcmul_tensor_slow_
+    CUDA: foreach_tensor_addcmul_tensor_cuda_
+  autogen: _foreach_addcmul.Tensor_out
+
+- func: _foreach_abs(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_abs_slow
+    CUDA: foreach_tensor_abs_cuda
+    MTIA: foreach_tensor_abs_mtia
+
+- func: _foreach_abs_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_abs_slow_
+    CUDA: foreach_tensor_abs_cuda_
+    MTIA: foreach_tensor_abs_mtia_
+  autogen: _foreach_abs.out
+
+- func: _foreach_acos(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_acos_slow
+    CUDA: foreach_tensor_acos_cuda
+
+- func: _foreach_acos_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_acos_slow_
+    CUDA: foreach_tensor_acos_cuda_
+  autogen: _foreach_acos.out
+
+- func: _foreach_asin(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_asin_slow
+    CUDA: foreach_tensor_asin_cuda
+
+- func: _foreach_asin_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_asin_slow_
+    CUDA: foreach_tensor_asin_cuda_
+  autogen: _foreach_asin.out
+
+- func: _foreach_atan(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_atan_slow
+    CUDA: foreach_tensor_atan_cuda
+
+- func: _foreach_atan_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_atan_slow_
+    CUDA: foreach_tensor_atan_cuda_
+  autogen: _foreach_atan.out
+
+- func: _foreach_ceil(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_ceil_slow
+    CUDA: foreach_tensor_ceil_cuda
+
+- func: _foreach_ceil_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_ceil_slow_
+    CUDA: foreach_tensor_ceil_cuda_
+  autogen: _foreach_ceil.out
+
+- func: _foreach_cos(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_cos_slow
+    CUDA: foreach_tensor_cos_cuda
+
+- func: _foreach_cos_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_cos_slow_
+    CUDA: foreach_tensor_cos_cuda_
+  autogen: _foreach_cos.out
+
+- func: _foreach_cosh(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_cosh_slow
+    CUDA: foreach_tensor_cosh_cuda
+
+- func: _foreach_cosh_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_cosh_slow_
+    CUDA: foreach_tensor_cosh_cuda_
+  autogen: _foreach_cosh.out
+
+- func: _foreach_erf(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_erf_slow
+    CUDA: foreach_tensor_erf_cuda
+
+- func: _foreach_erf_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_erf_slow_
+    CUDA: foreach_tensor_erf_cuda_
+  autogen: _foreach_erf.out
+
+- func: _foreach_erfc(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_erfc_slow
+    CUDA: foreach_tensor_erfc_cuda
+
+- func: _foreach_erfc_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_erfc_slow_
+    CUDA: foreach_tensor_erfc_cuda_
+  autogen: _foreach_erfc.out
+
+- func: _foreach_exp(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_exp_slow
+    CUDA: foreach_tensor_exp_cuda
+
+- func: _foreach_exp_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_exp_slow_
+    CUDA: foreach_tensor_exp_cuda_
+  autogen: _foreach_exp.out
+
+- func: _foreach_expm1(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_expm1_slow
+    CUDA: foreach_tensor_expm1_cuda
+
+- func: _foreach_expm1_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_expm1_slow_
+    CUDA: foreach_tensor_expm1_cuda_
+  autogen: _foreach_expm1.out
+
+- func: _foreach_floor(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_floor_slow
+    CUDA: foreach_tensor_floor_cuda
+
+- func: _foreach_floor_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_floor_slow_
+    CUDA: foreach_tensor_floor_cuda_
+  autogen: _foreach_floor.out
+
+- func: _foreach_frac(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_frac_slow
+    CUDA: foreach_tensor_frac_cuda
+
+- func: _foreach_frac_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_frac_slow_
+    CUDA: foreach_tensor_frac_cuda_
+  autogen: _foreach_frac.out
+
+- func: _foreach_lerp.List(Tensor[] self, Tensor[] tensors1, Tensor[] weights) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensors are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_ternary_lerp_slow
+    CUDA: foreach_tensor_lerp_ternary_cuda
+  autogen: _foreach_lerp.List_out
+
+- func: _foreach_lerp_.List(Tensor(a!)[] self, Tensor[] tensors1, Tensor[] weights) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensors are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_ternary_lerp_slow_
+    CUDA: foreach_tensor_lerp_ternary_cuda_
+  autogen: _foreach_lerp.List_out
+
+- func: _foreach_lerp.Scalar(Tensor[] self, Tensor[] tensors1, Scalar weight) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensors are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_lerp_list_kernel_slow
+    CUDA: foreach_tensor_lerp_list_cuda
+  autogen: _foreach_lerp.Scalar_out
+
+- func: _foreach_lerp_.Scalar(Tensor(a!)[] self, Tensor[] tensors1, Scalar weight) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensors are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_lerp_list_kernel_slow_
+    CUDA: foreach_tensor_lerp_list_cuda_
+  autogen: _foreach_lerp.Scalar_out
+
+- func: _foreach_lerp.ScalarList(Tensor[] self, Tensor[] tensors1, Scalar[] weight) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensors are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_lerp_scalarlist_kernel_slow
+    CUDA: foreach_tensor_lerp_scalarlist_cuda
+  autogen: _foreach_lerp.ScalarList_out
+
+- func: _foreach_lerp_.ScalarList(Tensor(a!)[] self, Tensor[] tensors1, Scalar[] weight) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensors are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_lerp_scalarlist_kernel_slow_
+    CUDA: foreach_tensor_lerp_scalarlist_cuda_
+  autogen: _foreach_lerp.ScalarList_out
+
+- func: _foreach_lgamma(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_lgamma_slow
+    CUDA: foreach_tensor_lgamma_cuda
+
+- func: _foreach_lgamma_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_lgamma_slow_
+    CUDA: foreach_tensor_lgamma_cuda_
+  autogen: _foreach_lgamma.out
+
+- func: _foreach_log(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_log_slow
+    CUDA: foreach_tensor_log_cuda
+
+- func: _foreach_log_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_log_slow_
+    CUDA: foreach_tensor_log_cuda_
+  autogen: _foreach_log.out
+
+- func: _foreach_log10(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_log10_slow
+    CUDA: foreach_tensor_log10_cuda
+
+- func: _foreach_log10_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_log10_slow_
+    CUDA: foreach_tensor_log10_cuda_
+  autogen: _foreach_log10.out
+
+- func: _foreach_log1p(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_log1p_slow
+    CUDA: foreach_tensor_log1p_cuda
+
+- func: _foreach_log1p_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_log1p_slow_
+    CUDA: foreach_tensor_log1p_cuda_
+  autogen: _foreach_log1p.out
+
+- func: _foreach_log2(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_log2_slow
+    CUDA: foreach_tensor_log2_cuda
+
+- func: _foreach_log2_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_log2_slow_
+    CUDA: foreach_tensor_log2_cuda_
+  autogen: _foreach_log2.out
+
+- func: _foreach_max(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_max_slow
+    CUDA: foreach_tensor_max_cuda
+  autogen: _foreach_max.out
+
+- func: _foreach_neg(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_neg_slow
+    CUDA: foreach_tensor_neg_cuda
+
+- func: _foreach_neg_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_neg_slow_
+    CUDA: foreach_tensor_neg_cuda_
+  autogen: _foreach_neg.out
+
+- func: _foreach_norm.Scalar(Tensor[] self, Scalar ord=2, ScalarType? dtype=None) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_norm_slow
+    CUDA: foreach_tensor_norm_cuda
+    MTIA: foreach_tensor_norm_mtia
+  autogen: _foreach_norm.Scalar_out
+
+- func: _foreach_pow.List(Tensor[] self, Tensor[] exponent) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_pow_list_kernel_slow
+    CUDA: foreach_tensor_pow_list_kernel_cuda
+
+- func: _foreach_pow.Scalar(Tensor[] self, Scalar exponent) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_pow_scalar_kernel_slow
+    CUDA: foreach_tensor_pow_scalar_kernel_cuda
+
+- func: _foreach_pow.ScalarList(Tensor[] self, Scalar[] exponent) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_pow_scalarlist_kernel_slow
+    CUDA: foreach_tensor_pow_scalarlist_kernel_cuda
+
+- func: _foreach_pow.ScalarAndTensor(Scalar self, Tensor[] exponent) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_scalar_pow_list_kernel_slow
+    CUDA: foreach_scalar_pow_list_kernel_cuda
+
+- func: _foreach_pow_.List(Tensor(a!)[] self, Tensor[] exponent) -> ()
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_pow_list_kernel_slow_
+    CUDA: foreach_tensor_pow_list_kernel_cuda_
+  autogen: _foreach_pow.List_out
+
+- func: _foreach_pow_.Scalar(Tensor(a!)[] self, Scalar exponent) -> ()
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_pow_scalar_kernel_slow_
+    CUDA: foreach_tensor_pow_scalar_kernel_cuda_
+  autogen: _foreach_pow.Scalar_out
+
+- func: _foreach_pow_.ScalarList(Tensor(a!)[] self, Scalar[] exponent) -> ()
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_pow_scalarlist_kernel_slow_
+    CUDA: foreach_tensor_pow_scalarlist_kernel_cuda_
+  autogen: _foreach_pow.ScalarList_out
+
+- func: _foreach_reciprocal(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_reciprocal_slow
+    CUDA: foreach_tensor_reciprocal_cuda
+
+- func: _foreach_reciprocal_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_reciprocal_slow_
+    CUDA: foreach_tensor_reciprocal_cuda_
+  autogen: _foreach_reciprocal.out
+
+- func: _foreach_round(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_round_slow
+    CUDA: foreach_tensor_round_cuda
+
+- func: _foreach_round_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_round_slow_
+    CUDA: foreach_tensor_round_cuda_
+  autogen: _foreach_round.out
+
+- func: _foreach_rsqrt(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_rsqrt_slow
+    CUDA: foreach_tensor_rsqrt_cuda
+
+- func: _foreach_rsqrt_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_rsqrt_slow_
+    CUDA: foreach_tensor_rsqrt_cuda_
+  autogen: _foreach_rsqrt.out
+
+- func: _foreach_sigmoid(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sigmoid_slow
+    CUDA: foreach_tensor_sigmoid_cuda
+
+- func: _foreach_sigmoid_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sigmoid_slow_
+    CUDA: foreach_tensor_sigmoid_cuda_
+  autogen: _foreach_sigmoid.out
+
+- func: _foreach_sign(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sign_slow
+    CUDA: foreach_tensor_sign_cuda
+
+- func: _foreach_sign_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sign_slow_
+    CUDA: foreach_tensor_sign_cuda_
+  autogen: _foreach_sign.out
+
+- func: _foreach_sin(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sin_slow
+    CUDA: foreach_tensor_sin_cuda
+
+- func: _foreach_sin_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sin_slow_
+    CUDA: foreach_tensor_sin_cuda_
+  autogen: _foreach_sin.out
+
+- func: _foreach_sinh(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sinh_slow
+    CUDA: foreach_tensor_sinh_cuda
+
+- func: _foreach_sinh_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sinh_slow_
+    CUDA: foreach_tensor_sinh_cuda_
+  autogen: _foreach_sinh.out
+
+- func: _foreach_sqrt(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sqrt_slow
+    CUDA: foreach_tensor_sqrt_cuda
+
+- func: _foreach_sqrt_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_sqrt_slow_
+    CUDA: foreach_tensor_sqrt_cuda_
+    MTIA: foreach_tensor_sqrt_mtia_
+  autogen: _foreach_sqrt.out
+
+- func: _foreach_tan(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_tan_slow
+    CUDA: foreach_tensor_tan_cuda
+
+- func: _foreach_tan_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_tan_slow_
+    CUDA: foreach_tensor_tan_cuda_
+  autogen: _foreach_tan.out
+
+- func: _foreach_tanh(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_tanh_slow
+    CUDA: foreach_tensor_tanh_cuda
+
+- func: _foreach_tanh_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_tanh_slow_
+    CUDA: foreach_tensor_tanh_cuda_
+  autogen: _foreach_tanh.out
+
+- func: _foreach_trunc(Tensor[] self) -> Tensor[]
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_trunc_slow
+    CUDA: foreach_tensor_trunc_cuda
+
+- func: _foreach_trunc_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_trunc_slow_
+    CUDA: foreach_tensor_trunc_cuda_
+  autogen: _foreach_trunc.out
+
+- func: _foreach_zero_(Tensor(a!)[] self) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_zero_slow_
+    CUDA: foreach_tensor_zero_cuda_
+  autogen: _foreach_zero, _foreach_zero.out
+
+- func: _foreach_copy_(Tensor(a!)[] self, Tensor[] src, bool non_blocking=False) -> ()
+  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: foreach_tensor_copy_list_kernel_slow_
+    CUDA: foreach_tensor_copy_list_kernel_cuda_
+    MTIA: foreach_tensor_copy_list_kernel_mtia_
+  autogen: _foreach_copy.out
+
+- func: _foreach_copy(Tensor[] self, Tensor[] src, bool non_blocking=False) -> Tensor[] self_out
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _foreach_copy
+    MTIA: foreach_tensor_copy_list_kernel_mtia
+
+- func: bucketize.Tensor(Tensor self, Tensor boundaries, *, bool out_int32=False, bool right=False) -> Tensor
+  dispatch:
+    CPU: bucketize_cpu
+    CUDA: bucketize_cuda
+    MPS: bucketize_mps
+
+- func: bucketize.Tensor_out(Tensor self, Tensor boundaries, *, bool out_int32=False, bool right=False, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU: bucketize_out_cpu
+    CUDA: bucketize_out_cuda
+    MPS: bucketize_out_mps
+
+- func: bucketize.Scalar(Scalar self, Tensor boundaries, *, bool out_int32=False, bool right=False) -> Tensor
+  dispatch:
+    CPU: bucketize_cpu
+    CUDA: bucketize_cuda
+    MPS: bucketize_mps
+  autogen: bucketize.Scalar_out
+
+- func: searchsorted.Tensor(Tensor sorted_sequence, Tensor self, *, bool out_int32=False, bool right=False, str? side=None, Tensor? sorter=None) -> Tensor
+  dispatch:
+    CPU: searchsorted_cpu
+    CUDA: searchsorted_cuda
+    MPS: searchsorted_mps
+
+- func: searchsorted.Tensor_out(Tensor sorted_sequence, Tensor self, *, bool out_int32=False, bool right=False, str? side=None, Tensor? sorter=None, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU: searchsorted_out_cpu
+    CUDA: searchsorted_out_cuda
+    MPS: searchsorted_out_mps
+
+- func: searchsorted.Scalar(Tensor sorted_sequence, Scalar self, *, bool out_int32=False, bool right=False, str? side=None, Tensor? sorter=None) -> Tensor
+  dispatch:
+    CPU: searchsorted_cpu
+    CUDA: searchsorted_cuda
+    MPS: searchsorted_mps
+
+- func: searchsorted.Scalar_out(Tensor sorted_sequence, Scalar self, *, bool out_int32=False, bool right=False, str? side=None, Tensor? sorter=None, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU: searchsorted_out_cpu
+    CUDA: searchsorted_out_cuda
+    MPS: searchsorted_out_mps
+
+- func: _convert_indices_from_coo_to_csr(Tensor self, int size, *, bool out_int32=False) -> Tensor
+  structured_delegate: _convert_indices_from_coo_to_csr.out
+
+- func: _convert_indices_from_coo_to_csr.out(Tensor self, int size, *, bool out_int32=False, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: _convert_indices_from_coo_to_csr_structured_cpu
+    CUDA: _convert_indices_from_coo_to_csr_structured_cuda
+
+- func: _convert_indices_from_csr_to_coo(Tensor crow_indices, Tensor col_indices, *, bool out_int32=False, bool transpose=False) -> Tensor
+  structured_delegate: _convert_indices_from_csr_to_coo.out
+
+- func: _convert_indices_from_csr_to_coo.out(Tensor crow_indices, Tensor col_indices, *, bool out_int32=False, bool transpose=False, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  dispatch:
+    CPU: _convert_indices_from_csr_to_coo_structured_cpu
+    CUDA: _convert_indices_from_csr_to_coo_structured_cuda
+
+## NN wrappers
+
+- func: mse_loss.out(Tensor self, Tensor target, int reduction=Mean, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA: mse_loss_out
+    MPS: mse_loss_out_mps
+
+- func: mse_loss(Tensor self, Tensor target, int reduction=Mean) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: mse_loss.out
+  python_module: nn
+
+- func: mse_loss_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, int reduction, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU, CUDA: mse_loss_backward_out
+    MPS: mse_loss_backward_out_mps
+
+- func: mse_loss_backward(Tensor grad_output, Tensor self, Tensor target, int reduction) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU, CUDA: mse_loss_backward
+    MPS: mse_loss_backward_mps
+
+- func: l1_loss(Tensor self, Tensor target, int reduction=Mean) -> Tensor
+  python_module: nn
+
+- func: multi_margin_loss.out(Tensor self, Tensor target, Scalar p=1, Scalar margin=1, Tensor? weight=None, int reduction=Mean, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: multi_margin_loss_cpu_out
+    CUDA: multi_margin_loss_cuda_out
+
+- func: multi_margin_loss(Tensor self, Tensor target, Scalar p=1, Scalar margin=1, Tensor? weight=None, int reduction=Mean) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: multi_margin_loss_cpu
+    CUDA: multi_margin_loss_cuda
+
+- func: multi_margin_loss_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, Scalar p, Scalar margin, Tensor? weight=None, int reduction=Mean, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: multi_margin_loss_cpu_backward_out
+    CUDA: multi_margin_loss_cuda_backward_out
+
+- func: multi_margin_loss_backward(Tensor grad_output, Tensor self, Tensor target, Scalar p, Scalar margin, Tensor? weight=None, int reduction=Mean) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: multi_margin_loss_cpu_backward
+    CUDA: multi_margin_loss_cuda_backward
+
+- func: multilabel_margin_loss.out(Tensor self, Tensor target, int reduction=Mean, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+
+- func: multilabel_margin_loss(Tensor self, Tensor target, int reduction=Mean) -> Tensor
+  python_module: nn
+
+- func: multilabel_margin_loss_forward.output(Tensor self, Tensor target, int reduction, *, Tensor(a!) output, Tensor(b!) is_target) -> (Tensor(a!), Tensor(b!))
+  python_module: nn
+  dispatch:
+    CPU: multilabel_margin_loss_forward_out_cpu
+    CUDA: multilabel_margin_loss_forward_out_cuda
+
+- func: multilabel_margin_loss_forward(Tensor self, Tensor target, int reduction) -> (Tensor output, Tensor is_target)
+  python_module: nn
+  dispatch:
+    CPU: multilabel_margin_loss_forward_cpu
+    CUDA: multilabel_margin_loss_forward_cuda
+
+- func: multilabel_margin_loss_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, int reduction, Tensor is_target, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: multilabel_margin_loss_backward_cpu_out
+    CUDA: multilabel_margin_loss_backward_cuda_out
+
+- func: multilabel_margin_loss_backward(Tensor grad_output, Tensor self, Tensor target, int reduction, Tensor is_target) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: multilabel_margin_loss_backward_cpu
+    CUDA: multilabel_margin_loss_backward_cuda
+
+- func: nll_loss.out(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+
+- func: nll_loss_nd(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: nll_loss_nd_symint
+
+- func: nll_loss(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: nll_loss_symint
+
+- func: nll_loss_forward.output(Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, *, Tensor(a!) output, Tensor(b!) total_weight) -> (Tensor(a!), Tensor(b!))
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: nll_loss_forward_out_cpu
+    CUDA: nll_loss_forward_out_cuda
+    MPS: nll_loss_forward_out_mps
+
+- func: nll_loss_forward(Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index) -> (Tensor output, Tensor total_weight)
+  python_module: nn
+  structured_delegate: nll_loss_forward.output
+
+- func: nll_loss_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, Tensor total_weight, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: nll_loss_backward_out_cpu
+    CUDA: nll_loss_backward_out_cuda
+    MPS: nll_loss_backward_out_mps
+
+- func: nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, Tensor total_weight) -> Tensor
+  python_module: nn
+  structured_delegate: nll_loss_backward.grad_input
+
+- func: nll_loss2d.out(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+
+- func: nll_loss2d(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: nll_loss2d_symint
+
+- func: nll_loss2d_forward.output(Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, *, Tensor(a!) output, Tensor(b!) total_weight) -> (Tensor(a!), Tensor(b!))
+  python_module: nn
+  dispatch:
+    CPU: nll_loss2d_forward_out_cpu
+    CUDA: nll_loss2d_forward_out_cuda
+    MPS: nll_loss2d_forward_out_mps
+
+- func: nll_loss2d_forward(Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index) -> (Tensor output, Tensor total_weight)
+  python_module: nn
+  dispatch:
+    CPU: nll_loss2d_forward_cpu
+    CUDA: nll_loss2d_forward_cuda
+    MPS: nll_loss2d_forward_mps
+
+- func: nll_loss2d_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, Tensor total_weight, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: nll_loss2d_backward_out_cpu
+    CUDA: nll_loss2d_backward_out_cuda
+    MPS: nll_loss2d_backward_out_mps
+
+- func: nll_loss2d_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, Tensor total_weight) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: nll_loss2d_backward_cpu
+    CUDA: nll_loss2d_backward_cuda
+    MPS: nll_loss2d_backward_mps
+
+- func: smooth_l1_loss.out(Tensor self, Tensor target, int reduction=Mean, float beta=1.0, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA: smooth_l1_loss_out
+    MPS: smooth_l1_loss_out_mps
+
+- func: smooth_l1_loss(Tensor self, Tensor target, int reduction=Mean, float beta=1.0) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  structured_delegate: smooth_l1_loss.out
+  python_module: nn
+
+- func: smooth_l1_loss_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, int reduction, float beta, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: smooth_l1_loss_backward_out
+    CUDA: smooth_l1_loss_backward_out
+    MPS: smooth_l1_loss_backward_out_mps
+
+- func: smooth_l1_loss_backward(Tensor grad_output, Tensor self, Tensor target, int reduction, float beta) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeExplicitAutograd: smooth_l1_loss_backward
+
+- func: huber_loss.out(Tensor self, Tensor target, int reduction=Mean, float delta=1.0, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU, CUDA: huber_loss_out
+    MPS: huber_loss_out_mps
+
+- func: huber_loss(Tensor self, Tensor target, int reduction=Mean, float delta=1.0) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU, CUDA: huber_loss
+    MPS: huber_loss_mps
+
+- func: huber_loss_backward.out(Tensor grad_output, Tensor self, Tensor target, int reduction, float delta, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU, CUDA: huber_loss_backward_out
+    MPS: huber_loss_backward_out_mps
+
+- func: huber_loss_backward(Tensor grad_output, Tensor self, Tensor target, int reduction, float delta) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeExplicitAutograd: huber_loss_backward
+
+- func: soft_margin_loss.out(Tensor self, Tensor target, int reduction=Mean, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CompositeExplicitAutograd: soft_margin_loss_out
+
+- func: soft_margin_loss(Tensor self, Tensor target, int reduction=Mean) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeExplicitAutograd: soft_margin_loss
+
+- func: soft_margin_loss_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, int reduction, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CompositeExplicitAutograd: soft_margin_loss_backward_out
+
+- func: soft_margin_loss_backward(Tensor grad_output, Tensor self, Tensor target, int reduction) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeExplicitAutograd: soft_margin_loss_backward
+
+- func: elu.out(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: elu_out
+
+- func: elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> Tensor
+  structured_delegate: elu.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  tags: [core, pointwise]
+
+- func: elu_backward.grad_input(Tensor grad_output, Scalar alpha, Scalar scale, Scalar input_scale, bool is_result, Tensor self_or_result, *, Tensor(a!) grad_input) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: elu_backward_out
+
+- func: elu_backward(Tensor grad_output, Scalar alpha, Scalar scale, Scalar input_scale, bool is_result, Tensor self_or_result) -> Tensor
+  structured_delegate: elu_backward.grad_input
+  python_module: nn
+
+- func: elu_(Tensor(a!) self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> Tensor(a!)
+  structured_delegate: elu.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+
+- func: glu.out(Tensor self, int dim=-1, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA: glu_out
+    MPS: glu_out_mps
+
+- func: glu(Tensor self, int dim=-1) -> Tensor
+  structured_delegate: glu.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+
+- func: glu_backward.grad_input(Tensor grad_output, Tensor self, int dim, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: glu_backward_cpu_out
+    CUDA: glu_backward_cuda_out
+    MPS: glu_backward_mps_out
+
+- func: glu_backward(Tensor grad_output, Tensor self, int dim) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: glu_backward_cpu
+    CUDA: glu_backward_cuda
+    MPS: glu_backward_mps
+
+- func: glu_jvp(Tensor glu, Tensor x, Tensor dx, int dim) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU, CUDA: glu_jvp
+  autogen: glu_jvp.out
+
+- func: glu_backward_jvp(Tensor grad_x, Tensor grad_glu, Tensor x, Tensor dgrad_glu, Tensor dx, int dim) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU, CUDA: glu_backward_jvp
+  autogen: glu_backward_jvp.out
+
+- func: hardsigmoid.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: hardsigmoid_out
+    QuantizedCPU: hardsigmoid_out_quantized_cpu
+
+- func: hardsigmoid(Tensor self) -> Tensor
+  structured_delegate: hardsigmoid.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    QuantizedCPU: hardsigmoid_quantized_cpu
+  tags: pointwise
+
+- func: hardsigmoid_(Tensor(a!) self) -> Tensor(a!)
+  structured_delegate: hardsigmoid.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+
+- func: hardsigmoid_backward.grad_input(Tensor grad_output, Tensor self, *, Tensor(a!) grad_input) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: hardsigmoid_backward_out
+
+- func: hardsigmoid_backward(Tensor grad_output, Tensor self) -> Tensor
+  structured_delegate: hardsigmoid_backward.grad_input
+  python_module: nn
+
+- func: hardtanh.out(Tensor self, Scalar min_val=-1, Scalar max_val=1, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: hardtanh_out
+    QuantizedCPU: hardtanh_out_quantized_cpu
+
+- func: hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: hardtanh
+    QuantizedCPU: hardtanh_quantized_cpu
+  tags: [pointwise, core]
+
+- func: hardtanh_backward.grad_input(Tensor grad_output, Tensor self, Scalar min_val, Scalar max_val, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU, CUDA: hardtanh_backward_out
+    MPS: hardtanh_backward_out_mps
+
+- func: hardtanh_backward(Tensor grad_output, Tensor self, Scalar min_val, Scalar max_val) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU, CUDA: hardtanh_backward
+    MPS: hardtanh_backward_mps
+
+- func: hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: hardtanh_
+    QuantizedCPU: hardtanh_quantized_cpu_
+
+- func: hardswish.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: hardswish_out
+
+- func: hardswish(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: hardswish
+
+- func: hardswish_(Tensor(a!) self) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: hardswish_
+
+- func: hardswish_backward(Tensor grad_output, Tensor self) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: hardswish_backward
+  autogen: hardswish_backward.out
+
+- func: leaky_relu.out(Tensor self, Scalar negative_slope=0.01, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: leaky_relu_out
+    QuantizedCPU: leaky_relu_out_quantized_cpu
+
+- func: leaky_relu(Tensor self, Scalar negative_slope=0.01) -> Tensor
+  structured_delegate: leaky_relu.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    QuantizedCPU: leaky_relu_quantized_cpu
+  tags: core
+
+- func: leaky_relu_backward.grad_input(Tensor grad_output, Tensor self, Scalar negative_slope, bool self_is_result, *, Tensor(a!) grad_input) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: leaky_relu_backward_out
+
+- func: leaky_relu_backward(Tensor grad_output, Tensor self, Scalar negative_slope, bool self_is_result) -> Tensor
+  structured_delegate: leaky_relu_backward.grad_input
+  python_module: nn
+
+- func: leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> Tensor(a!)
+  structured_delegate: leaky_relu.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    QuantizedCPU: leaky_relu_quantized_cpu_
+
+- func: log_sigmoid.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+
+- func: log_sigmoid(Tensor self) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+
+- func: log_sigmoid_forward.output(Tensor self, *, Tensor(a!) output, Tensor(b!) buffer) -> (Tensor(a!), Tensor(b!))
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU: log_sigmoid_forward_out_cpu
+    CUDA: log_sigmoid_forward_out_cuda
+    MPS: log_sigmoid_forward_out_mps
+
+- func: log_sigmoid_forward(Tensor self) -> (Tensor output, Tensor buffer)
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU: log_sigmoid_forward_cpu
+    CUDA: log_sigmoid_forward_cuda
+    MPS: log_sigmoid_forward_mps
+
+- func: log_sigmoid_backward.grad_input(Tensor grad_output, Tensor self, Tensor buffer, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: log_sigmoid_backward_cpu_out
+    CUDA: log_sigmoid_backward_cuda_out
+    MPS: log_sigmoid_backward_mps_out
+
+- func: log_sigmoid_backward(Tensor grad_output, Tensor self, Tensor buffer) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: log_sigmoid_backward_cpu
+    CUDA: log_sigmoid_backward_cuda
+    MPS: log_sigmoid_backward_mps
+
+- func: rrelu_with_noise.out(Tensor self, Tensor(b!) noise, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  tags: nondeterministic_seeded
+  dispatch:
+    CPU: rrelu_with_noise_out_cpu
+    CUDA: rrelu_with_noise_out_cuda
+
+- func: rrelu_with_noise(Tensor self, Tensor(b!) noise, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: rrelu_with_noise_cpu
+    CUDA: rrelu_with_noise_cuda
+  tags: nondeterministic_seeded
+  autogen: rrelu_with_noise_functional
+
+- func: rrelu_with_noise_backward(Tensor grad_output, Tensor self, Tensor noise, Scalar lower, Scalar upper, bool training, bool self_is_result) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeExplicitAutograd: rrelu_with_noise_backward
+  autogen: rrelu_with_noise_backward.out
+
+- func: rrelu_with_noise_(Tensor(a!) self, Tensor(b!) noise, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None) -> Tensor(a!)
+  python_module: nn
+  tags: nondeterministic_seeded
+  dispatch:
+    CPU: rrelu_with_noise_cpu_
+    CUDA: rrelu_with_noise_cuda_
+
+- func: softplus.out(Tensor self, Scalar beta=1, Scalar threshold=20, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA: softplus_out
+    MPS: softplus_out_mps
+
+- func: softplus(Tensor self, Scalar beta=1, Scalar threshold=20) -> Tensor
+  structured_delegate: softplus.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  tags: pointwise
+
+- func: softplus_backward.grad_input(Tensor grad_output, Tensor self, Scalar beta, Scalar threshold, *, Tensor(a!) grad_input) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA: softplus_backward_out
+    MPS: softplus_backward_out_mps
+
+- func: softplus_backward(Tensor grad_output, Tensor self, Scalar beta, Scalar threshold) -> Tensor
+  structured_delegate: softplus_backward.grad_input
+  python_module: nn
+
+- func: softshrink.out(Tensor self, Scalar lambd=0.5, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: softshrink_out
+
+- func: softshrink(Tensor self, Scalar lambd=0.5) -> Tensor
+  structured_delegate: softshrink.out
+  device_check: NoCheck   # TensorIterator
+  python_module: nn
+  tags: pointwise
+
+- func: softshrink_backward.grad_input(Tensor grad_output, Tensor self, Scalar lambd, *, Tensor(a!) grad_input) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: nn
+  dispatch:
+    CPU, CUDA, MPS: softshrink_backward_out
+
+- func: softshrink_backward(Tensor grad_output, Tensor self, Scalar lambd) -> Tensor
+  structured_delegate: softshrink_backward.grad_input
+  python_module: nn
+
+- func: adaptive_avg_pool2d.out(Tensor self, SymInt[2] output_size, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: adaptive_avg_pool2d_out_cpu
+    CUDA: adaptive_avg_pool2d_out_cuda
+    MPS: adaptive_avg_pool2d_out_mps
+    MkldnnCPU: mkldnn_adaptive_avg_pool2d_out_stub
+
+- func: adaptive_avg_pool2d(Tensor self, SymInt[2] output_size) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: adaptive_avg_pool2d_symint
+
+- func: mkldnn_adaptive_avg_pool2d(Tensor self, int[2] output_size) -> Tensor
+  dispatch:
+    MkldnnCPU: mkldnn_adaptive_avg_pool2d
+
+- func: mkldnn_adaptive_avg_pool2d.out(Tensor self, int[2] output_size, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    MkldnnCPU: mkldnn_adaptive_avg_pool2d_out
+
+- func: mkldnn_adaptive_avg_pool2d_backward(Tensor grad_output, Tensor self) -> Tensor
+  dispatch:
+    MkldnnCPU: mkldnn_adaptive_avg_pool2d_backward
+  autogen: mkldnn_adaptive_avg_pool2d_backward.out
+
+- func: _adaptive_avg_pool2d(Tensor self, SymInt[2] output_size) -> Tensor
+  dispatch:
+    CPU: adaptive_avg_pool2d_cpu
+    CUDA: adaptive_avg_pool2d_cuda
+    MPS: adaptive_avg_pool2d_mps
+    QuantizedCPU: adaptive_avg_pool2d_quantized_cpu
+    QuantizedCUDA: adaptive_avg_pool2d_quantized_cuda
+  autogen: _adaptive_avg_pool2d.out
+  tags: core
+
+- func: _adaptive_avg_pool2d_backward(Tensor grad_output, Tensor self) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: adaptive_avg_pool2d_backward_cpu
+    CUDA: adaptive_avg_pool2d_backward_cuda
+    MPS: adaptive_avg_pool2d_backward_mps
+  autogen: _adaptive_avg_pool2d_backward.out
+  tags: core
+
+- func: adaptive_avg_pool3d.out(Tensor self, SymInt[3] output_size, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: adaptive_avg_pool3d_out_cpu
+    CUDA: adaptive_avg_pool3d_out_cuda
+    QuantizedCPU: adaptive_avg_pool3d_out_quantized_cpu
+
+- func: adaptive_avg_pool3d(Tensor self, SymInt[3] output_size) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: adaptive_avg_pool3d_symint
+
+- func: _adaptive_avg_pool3d(Tensor self, SymInt[3] output_size) -> Tensor
+  dispatch:
+    CPU: adaptive_avg_pool3d_cpu
+    CUDA: adaptive_avg_pool3d_cuda
+    QuantizedCPU: adaptive_avg_pool3d_quantized_cpu
+  autogen: _adaptive_avg_pool3d.out
+  tags: core
+
+- func: adaptive_avg_pool3d_backward.grad_input(Tensor grad_output, Tensor self, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: adaptive_avg_pool3d_backward_out_cpu
+    CUDA: adaptive_avg_pool3d_backward_out_cuda
+
+- func: _adaptive_avg_pool3d_backward(Tensor grad_output, Tensor self) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: adaptive_avg_pool3d_backward_cpu
+    CUDA: adaptive_avg_pool3d_backward_cuda
+  autogen: _adaptive_avg_pool3d_backward.out
+
+# Return: (Tensor output, Tensor indices)
+- func: adaptive_max_pool2d.out(Tensor self, int[2] output_size, *, Tensor(a!) out, Tensor(b!) indices) -> (Tensor(a!), Tensor(b!))
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: adaptive_max_pool2d_out_cpu
+    CUDA: adaptive_max_pool2d_out_cuda
+    MPS: adaptive_max_pool2d_out_mps
+
+# Return: (Tensor output, Tensor indices)
+- func: adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
+  python_module: nn
+  structured_delegate: adaptive_max_pool2d.out
+
+- func: adaptive_max_pool2d_backward.grad_input(Tensor grad_output, Tensor self, Tensor indices, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: adaptive_max_pool2d_backward_out_cpu
+    CUDA: adaptive_max_pool2d_backward_out_cuda
+    MPS: adaptive_max_pool2d_backward_out_mps
+
+- func: adaptive_max_pool2d_backward(Tensor grad_output, Tensor self, Tensor indices) -> Tensor
+  python_module: nn
+  structured_delegate: adaptive_max_pool2d_backward.grad_input
+
+# Return: (Tensor output, Tensor indices)
+- func: adaptive_max_pool3d.out(Tensor self, int[3] output_size, *, Tensor(a!) out, Tensor(b!) indices) -> (Tensor(a!), Tensor(b!))
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: adaptive_max_pool3d_out_cpu
+    CUDA: adaptive_max_pool3d_out_cuda
+
+# Return: (Tensor output, Tensor indices)
+- func: adaptive_max_pool3d(Tensor self, int[3] output_size) -> (Tensor, Tensor)
+  python_module: nn
+  structured_delegate: adaptive_max_pool3d.out
+
+- func: adaptive_max_pool3d_backward.grad_input(Tensor grad_output, Tensor self, Tensor indices, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: adaptive_max_pool3d_backward_out_cpu
+    CUDA: adaptive_max_pool3d_backward_out_cuda
+
+- func: adaptive_max_pool3d_backward(Tensor grad_output, Tensor self, Tensor indices) -> Tensor
+  python_module: nn
+  structured_delegate: adaptive_max_pool3d_backward.grad_input
+
+- func: avg_pool2d.out(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  precomputed:
+  - kernel_size -> int kH, int kW
+  - stride -> int dH, int dW
+  - padding -> int padH, int padW
+  dispatch:
+    CPU: avg_pool2d_out_cpu
+    CUDA: avg_pool2d_out_cuda
+    MPS: avg_pool2d_out_mps
+    MkldnnCPU: mkldnn_avg_pool2d_out
+
+- func: avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> Tensor
+  python_module: nn
+  structured_delegate: avg_pool2d.out
+  dispatch:
+    MkldnnCPU: mkldnn_avg_pool2d
+    QuantizedCPU: avg_pool2d_quantized_cpu
+  tags: core
+
+- func: avg_pool2d_backward.grad_input(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, bool ceil_mode, bool count_include_pad, int? divisor_override, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: avg_pool2d_backward_out_cpu
+    CUDA: avg_pool2d_backward_out_cuda
+    MPS: avg_pool2d_backward_out_mps
+    MkldnnCPU: mkldnn_avg_pool2d_backward_out
+
+- func: avg_pool2d_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, bool ceil_mode, bool count_include_pad, int? divisor_override) -> Tensor
+  python_module: nn
+  structured_delegate: avg_pool2d_backward.grad_input
+  dispatch:
+    MkldnnCPU: mkldnn_avg_pool2d_backward
+  tags: core
+
+- func: avg_pool3d.out(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: avg_pool3d_out_cpu
+    CUDA: avg_pool3d_out_cuda
+    MPS: avg_pool3d_out_mps
+    MkldnnCPU: mkldnn_avg_pool3d_out
+
+- func: avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> Tensor
+  python_module: nn
+  structured_delegate: avg_pool3d.out
+  dispatch:
+    MkldnnCPU: mkldnn_avg_pool3d
+    QuantizedCPU: avg_pool3d_quantized_cpu
+  tags: core
+
+- func: avg_pool3d_backward.grad_input(Tensor grad_output, Tensor self, int[3] kernel_size, int[3] stride, int[3] padding, bool ceil_mode, bool count_include_pad, int? divisor_override, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: avg_pool3d_backward_out_cpu
+    CUDA: avg_pool3d_backward_out_cuda
+    MPS: avg_pool3d_backward_out_mps
+    MkldnnCPU: mkldnn_avg_pool3d_backward_out
+
+- func: avg_pool3d_backward(Tensor grad_output, Tensor self, int[3] kernel_size, int[3] stride, int[3] padding, bool ceil_mode, bool count_include_pad, int? divisor_override) -> Tensor
+  python_module: nn
+  structured_delegate: avg_pool3d_backward.grad_input
+  dispatch:
+    MkldnnCPU: mkldnn_avg_pool3d_backward
+
+# Return: (Tensor output, Tensor indices)
+- func: fractional_max_pool2d.output(Tensor self, int[2] kernel_size, int[2] output_size, Tensor random_samples, *, Tensor(a!) output, Tensor(b!) indices) -> (Tensor(a!), Tensor(b!))
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: fractional_max_pool2d_out_cpu
+    CUDA: fractional_max_pool2d_out_cuda
+
+# Return: (Tensor output, Tensor indices)
+- func: fractional_max_pool2d(Tensor self, int[2] kernel_size, int[2] output_size, Tensor random_samples) -> (Tensor, Tensor)
+  python_module: nn
+  structured_delegate: fractional_max_pool2d.output
+
+- func: fractional_max_pool2d_backward.grad_input(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] output_size, Tensor indices, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: fractional_max_pool2d_backward_cpu
+    CUDA: fractional_max_pool2d_backward_cuda
+
+- func: fractional_max_pool2d_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] output_size, Tensor indices) -> Tensor
+  python_module: nn
+  structured_delegate: fractional_max_pool2d_backward.grad_input
+
+# Return: (Tensor output, Tensor indices)
+- func: fractional_max_pool3d.output(Tensor self, int[3] kernel_size, int[3] output_size, Tensor random_samples, *, Tensor(a!) output, Tensor(b!) indices) -> (Tensor(a!), Tensor(b!))
+  python_module: nn
+  structured: True
+  precomputed:
+  - kernel_size -> int poolSizeT, int poolSizeH, int poolSizeW
+  - output_size -> int outputT, int outputH, int outputW
+  - int numBatch, int numPlanes, int inputT, int inputH, int inputW
+  dispatch:
+    CPU: fractional_max_pool3d_out_cpu
+    CUDA: fractional_max_pool3d_out_cuda
+
+# Return: (Tensor output, Tensor indices)
+- func: fractional_max_pool3d(Tensor self, int[3] kernel_size, int[3] output_size, Tensor random_samples) -> (Tensor, Tensor)
+  python_module: nn
+  structured_delegate: fractional_max_pool3d.output
+
+- func: fractional_max_pool3d_backward.grad_input(Tensor grad_output, Tensor self, int[3] kernel_size, int[3] output_size, Tensor indices, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: fractional_max_pool3d_backward_out_cpu
+    CUDA: fractional_max_pool3d_backward_out_cuda
+
+- func: fractional_max_pool3d_backward(Tensor grad_output, Tensor self, int[3] kernel_size, int[3] output_size, Tensor indices) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: fractional_max_pool3d_backward_cpu
+    CUDA: fractional_max_pool3d_backward_cuda
+
+# Return: (Tensor output, Tensor indices)
+- func: max_pool2d_with_indices.out(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False, *, Tensor(a!) out, Tensor(b!) indices) -> (Tensor(a!), Tensor(b!))
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: max_pool2d_with_indices_out_cpu
+    CUDA: max_pool2d_with_indices_out_cuda
+    MPS: max_pool2d_with_indices_out_mps
+
+# Return: (Tensor output, Tensor indices)
+- func: max_pool2d_with_indices(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> (Tensor, Tensor)
+  python_module: nn
+  structured_delegate: max_pool2d_with_indices.out
+  tags: core
+
+- func: max_pool2d_with_indices_backward.grad_input(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, int[2] dilation, bool ceil_mode, Tensor indices, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: max_pool2d_with_indices_backward_out_cpu
+    CUDA: max_pool2d_with_indices_backward_out_cuda
+    MPS: max_pool2d_with_indices_backward_out_mps
+
+- func: max_pool2d_with_indices_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, int[2] dilation, bool ceil_mode, Tensor indices) -> Tensor
+  python_module: nn
+  structured_delegate: max_pool2d_with_indices_backward.grad_input
+  tags: core
+
+# Return: (Tensor output, Tensor indices)
+- func: max_pool3d_with_indices.out(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, int[3] dilation=1, bool ceil_mode=False, *, Tensor(a!) out, Tensor(b!) indices) -> (Tensor(a!), Tensor(b!))
+  python_module: nn
+  dispatch:
+    CPU: max_pool3d_with_indices_out_cpu
+    CUDA: max_pool3d_with_indices_out_cuda
+    MPS: max_pool3d_with_indices_out_mps
+
+# Return: (Tensor output, Tensor indices)
+- func: max_pool3d_with_indices(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, int[3] dilation=1, bool ceil_mode=False) -> (Tensor, Tensor)
+  python_module: nn
+  dispatch:
+    CPU: max_pool3d_with_indices_cpu
+    CUDA: max_pool3d_with_indices_cuda
+    MPS: max_pool3d_with_indices_mps
+  tags: core
+
+- func: max_pool3d_with_indices_backward.grad_input(Tensor grad_output, Tensor self, int[3] kernel_size, int[3] stride, int[3] padding, int[3] dilation, bool ceil_mode, Tensor indices, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: max_pool3d_with_indices_backward_out_cpu
+    CUDA: max_pool3d_with_indices_backward_out_cuda
+    MPS: max_pool3d_with_indices_backward_out_mps
+
+- func: max_pool3d_with_indices_backward(Tensor grad_output, Tensor self, int[3] kernel_size, int[3] stride, int[3] padding, int[3] dilation, bool ceil_mode, Tensor indices) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: max_pool3d_with_indices_backward_cpu
+    CUDA: max_pool3d_with_indices_backward_cuda
+    MPS: max_pool3d_with_indices_backward_mps
+
+- func: max_unpool2d.out(Tensor self, Tensor indices, SymInt[2] output_size, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: max_unpooling2d_forward_out_cpu
+    CUDA: max_unpooling2d_forward_out_cuda
+    MPS: max_unpooling2d_forward_out_mps
+
+- func: max_unpool2d(Tensor self, Tensor indices, SymInt[2] output_size) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: max_unpooling2d_forward_cpu
+    CUDA: max_unpooling2d_forward_cuda
+    MPS: max_unpooling2d_forward_mps
+
+- func: max_unpool3d.out(Tensor self, Tensor indices, SymInt[3] output_size, int[3] stride, int[3] padding, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: max_unpooling3d_forward_out_cpu
+    CUDA: max_unpooling3d_forward_out_cuda
+    MPS: max_unpooling3d_forward_out_mps
+
+- func: max_unpool3d(Tensor self, Tensor indices, SymInt[3] output_size, int[3] stride, int[3] padding) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: max_unpooling3d_forward_cpu
+    CUDA: max_unpooling3d_forward_cuda
+    MPS: max_unpooling3d_forward_mps
+
+- func: reflection_pad1d.out(Tensor self, SymInt[2] padding, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: reflection_pad1d_out_cpu
+    QuantizedCPU: reflection_pad1d_out_quantized_cpu
+    CUDA: reflection_pad1d_out_cuda
+    MPS: reflection_pad1d_out_mps
+
+- func: reflection_pad1d(Tensor self, SymInt[2] padding) -> Tensor
+  python_module: nn
+  structured_delegate: reflection_pad1d.out
+  tags: core
+
+- func: reflection_pad1d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[2] padding, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: reflection_pad1d_backward_out_cpu
+    CUDA: reflection_pad1d_backward_out_cuda
+    MPS: reflection_pad1d_backward_out_mps
+
+- func: reflection_pad1d_backward(Tensor grad_output, Tensor self, SymInt[2] padding) -> Tensor
+  python_module: nn
+  structured_delegate: reflection_pad1d_backward.grad_input
+
+- func: reflection_pad2d.out(Tensor self, SymInt[4] padding, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU, QuantizedCPU: reflection_pad2d_out_cpu
+    CUDA: reflection_pad2d_out_cuda
+    MPS: reflection_pad2d_out_mps
+
+- func: reflection_pad2d(Tensor self, SymInt[4] padding) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: reflection_pad2d_cpu
+    QuantizedCPU: reflection_pad2d_quantized_cpu
+    CUDA: reflection_pad2d_cuda
+    MPS: reflection_pad2d_mps
+  tags: core
+
+- func: reflection_pad2d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[4] padding, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: reflection_pad2d_backward_out_cpu
+    CUDA: reflection_pad2d_backward_out_cuda
+    MPS: reflection_pad2d_backward_out_mps
+
+- func: reflection_pad2d_backward(Tensor grad_output, Tensor self, SymInt[4] padding) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: reflection_pad2d_backward_cpu
+    CUDA: reflection_pad2d_backward_cuda
+    MPS: reflection_pad2d_backward_mps
+
+- func: reflection_pad3d.out(Tensor self, SymInt[6] padding, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: reflection_pad3d_out_cpu
+    CUDA: reflection_pad3d_out_cuda
+    MPS: reflection_pad3d_out_mps
+
+- func: reflection_pad3d(Tensor self, SymInt[6] padding) -> Tensor
+  python_module: nn
+  structured_delegate: reflection_pad3d.out
+  tags: core
+
+- func: reflection_pad3d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[6] padding, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: reflection_pad3d_backward_out_cpu
+    CUDA: reflection_pad3d_backward_out_cuda
+    MPS: reflection_pad3d_backward_out_mps
+
+- func: reflection_pad3d_backward(Tensor grad_output, Tensor self, SymInt[6] padding) -> Tensor
+  python_module: nn
+  structured_delegate: reflection_pad3d_backward.grad_input
+
+- func: replication_pad1d.out(Tensor self, SymInt[2] padding, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: replication_pad1d_out_cpu
+    CUDA: replication_pad1d_out_cuda
+    MPS: replication_pad1d_out_mps
+
+- func: replication_pad1d(Tensor self, SymInt[2] padding) -> Tensor
+  python_module: nn
+  structured_delegate: replication_pad1d.out
+
+- func: replication_pad1d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[2] padding, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: replication_pad1d_backward_out_cpu
+    CUDA: replication_pad1d_backward_out_cuda
+    MPS: replication_pad1d_backward_out_mps
+
+- func: replication_pad1d_backward(Tensor grad_output, Tensor self, SymInt[2] padding) -> Tensor
+  python_module: nn
+  structured_delegate: replication_pad1d_backward.grad_input
+
+- func: replication_pad2d.out(Tensor self, SymInt[4] padding, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: replication_pad2d_out_cpu
+    CUDA: replication_pad2d_out_cuda
+    MPS: replication_pad2d_out_mps
+
+- func: replication_pad2d(Tensor self, SymInt[4] padding) -> Tensor
+  python_module: nn
+  structured_delegate: replication_pad2d.out
+  tags: core
+
+- func: replication_pad2d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[4] padding, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: replication_pad2d_backward_out_cpu
+    CUDA: replication_pad2d_backward_out_cuda
+    MPS: replication_pad2d_backward_out_mps
+
+- func: replication_pad2d_backward(Tensor grad_output, Tensor self, SymInt[4] padding) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: replication_pad2d_backward_cpu
+    CUDA: replication_pad2d_backward_cuda
+    MPS: replication_pad2d_backward_mps
+
+- func: replication_pad3d.out(Tensor self, SymInt[6] padding, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: replication_pad3d_out_cpu
+    CUDA: replication_pad3d_out_cuda
+    MPS: replication_pad3d_out_mps
+
+- func: replication_pad3d(Tensor self, SymInt[6] padding) -> Tensor
+  python_module: nn
+  structured_delegate: replication_pad3d.out
+  tags: core
+
+
+- func: replication_pad3d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[6] padding, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: replication_pad3d_backward_out_cpu
+    CUDA: replication_pad3d_backward_out_cuda
+    MPS: replication_pad3d_backward_out_mps
+
+- func: replication_pad3d_backward(Tensor grad_output, Tensor self, SymInt[6] padding) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: replication_pad3d_backward_cpu
+    CUDA: replication_pad3d_backward_cuda
+    MPS: replication_pad3d_backward_mps
+
+- func: _pad_circular(Tensor self, SymInt[] pad) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: _pad_circular_symint
+
+- func: _pad_enum(Tensor self, SymInt[] pad, int mode, float? value=None) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: _pad_enum_symint
+
+- func: pad(Tensor self, SymInt[] pad, str mode="constant", float? value=None) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeImplicitAutograd: pad_symint
+
+- func: upsample_linear1d.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: upsample_linear1d.vec_out
+
+- func: upsample_bilinear2d.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: upsample_bilinear2d.vec_out
+  tags: core
+
+- func: _upsample_bilinear2d_aa.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: _upsample_bilinear2d_aa.vec_out
+
+- func: upsample_trilinear3d.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: upsample_trilinear3d.vec_out
+
+- func: upsample_bicubic2d.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: upsample_bicubic2d.vec_out
+
+- func: _upsample_bicubic2d_aa.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: _upsample_bicubic2d_aa.vec_out
+
+- func: upsample_nearest1d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: upsample_nearest1d.vec_out
+
+- func: _upsample_nearest_exact1d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: _upsample_nearest_exact1d.vec_out
+
+- func: upsample_nearest2d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: upsample_nearest2d.vec_out
+  tags: core
+
+- func: _upsample_nearest_exact2d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: _upsample_nearest_exact2d.vec_out
+
+- func: upsample_nearest3d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: upsample_nearest3d.vec_out
+
+- func: _upsample_nearest_exact3d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor
+  python_module: nn
+  autogen: _upsample_nearest_exact3d.vec_out
+
+# NOTE: all of the non-"vec" upsample overloads are only kept for backward compatibility.
+- func: upsample_linear1d.out(Tensor self, SymInt[1] output_size, bool align_corners, float? scales=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_linear1d_out_cpu
+    CUDA: upsample_linear1d_out_cuda
+    MPS: upsample_linear1d_out_mps
+
+- func: upsample_linear1d(Tensor self, SymInt[1] output_size, bool align_corners, float? scales=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_linear1d.out
+
+- func: upsample_linear1d_backward.grad_input(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, bool align_corners, float? scales=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_linear1d_backward_out_cpu
+    CUDA: upsample_linear1d_backward_out_cuda
+    MPS: upsample_linear1d_backward_out_mps
+
+- func: upsample_linear1d_backward(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, bool align_corners, float? scales=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_linear1d_backward.grad_input
+
+- func: upsample_bilinear2d.out(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_bilinear2d_out_cpu
+    CUDA: upsample_bilinear2d_out_cuda
+    MPS: upsample_bilinear2d_out_mps
+
+- func: upsample_bilinear2d(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_bilinear2d.out
+  dispatch:
+    QuantizedCPU: upsample_bilinear2d_quantized_cpu
+
+- func: upsample_bilinear2d_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_bilinear2d_backward_out_cpu
+    CUDA: upsample_bilinear2d_backward_out_cuda
+    MPS: upsample_bilinear2d_backward_out_mps
+
+- func: upsample_bilinear2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_bilinear2d_backward.grad_input
+
+- func: _upsample_bilinear2d_aa.out(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: _upsample_bilinear2d_aa_out_cpu
+    CUDA: _upsample_bilinear2d_aa_out_cuda
+    MPS: _upsample_bilinear2d_aa_out_mps
+
+- func: _upsample_bilinear2d_aa(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: _upsample_bilinear2d_aa.out
+
+- func: _upsample_bilinear2d_aa_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: _upsample_bilinear2d_aa_backward_out_cpu
+    CUDA: _upsample_bilinear2d_aa_backward_out_cuda
+
+- func: _upsample_bilinear2d_aa_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: _upsample_bilinear2d_aa_backward.grad_input
+
+- func: upsample_bicubic2d.out(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_bicubic2d_out_cpu
+    CUDA: upsample_bicubic2d_out_cuda
+    MPS: upsample_bicubic2d_out_mps
+
+- func: upsample_bicubic2d(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_bicubic2d.out
+
+- func: upsample_bicubic2d_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_bicubic2d_backward_out_cpu
+    CUDA: upsample_bicubic2d_backward_out_cuda
+    MPS: upsample_bicubic2d_backward_out_mps
+
+- func: upsample_bicubic2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_bicubic2d_backward.grad_input
+
+- func: _upsample_bicubic2d_aa.out(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: _upsample_bicubic2d_aa_out_cpu
+    CUDA: _upsample_bicubic2d_aa_out_cuda
+    MPS: _upsample_bicubic2d_aa_out_mps
+
+- func: _upsample_bicubic2d_aa(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: _upsample_bicubic2d_aa.out
+
+- func: _upsample_bicubic2d_aa_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: _upsample_bicubic2d_aa_backward_out_cpu
+    CUDA: _upsample_bicubic2d_aa_backward_out_cuda
+
+- func: _upsample_bicubic2d_aa_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: _upsample_bicubic2d_aa_backward.grad_input
+
+- func: upsample_trilinear3d.out(Tensor self, SymInt[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_trilinear3d_out_cpu
+    CUDA: upsample_trilinear3d_out_cuda
+    MPS: upsample_trilinear3d_out_mps
+
+- func: upsample_trilinear3d(Tensor self, SymInt[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_trilinear3d.out
+
+- func: upsample_trilinear3d_backward.grad_input(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_trilinear3d_backward_out_cpu
+    CUDA: upsample_trilinear3d_backward_out_cuda
+    MPS: upsample_trilinear3d_backward_out_mps
+
+- func: upsample_trilinear3d_backward(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_trilinear3d_backward.grad_input
+
+- func: upsample_nearest1d.out(Tensor self, SymInt[1] output_size, float? scales=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_nearest1d_out_cpu
+    CUDA: upsample_nearest1d_out_cuda
+    MPS: upsample_nearest1d_out_mps
+
+- func: _upsample_nearest_exact1d.out(Tensor self, SymInt[1] output_size, float? scales=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: _upsample_nearest_exact1d_out_cpu
+    CUDA: _upsample_nearest_exact1d_out_cuda
+    MPS: _upsample_nearest_exact1d_out_mps
+
+- func: upsample_nearest1d(Tensor self, SymInt[1] output_size, float? scales=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_nearest1d.out
+
+- func: _upsample_nearest_exact1d(Tensor self, SymInt[1] output_size, float? scales=None) -> Tensor
+  python_module: nn
+  structured_delegate: _upsample_nearest_exact1d.out
+
+- func: upsample_nearest1d_backward.grad_input(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, float? scales=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_nearest1d_backward_out_cpu
+    CUDA: upsample_nearest1d_backward_out_cuda
+    MPS: upsample_nearest1d_backward_out_mps
+
+- func: _upsample_nearest_exact1d_backward.grad_input(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, float? scales=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: _upsample_nearest_exact1d_backward_out_cpu
+    CUDA: _upsample_nearest_exact1d_backward_out_cuda
+    MPS: _upsample_nearest_exact1d_backward_out_mps
+
+- func: upsample_nearest1d_backward(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, float? scales=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_nearest1d_backward.grad_input
+
+- func: _upsample_nearest_exact1d_backward(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, float? scales=None) -> Tensor
+  python_module: nn
+  structured_delegate: _upsample_nearest_exact1d_backward.grad_input
+
+- func: upsample_nearest2d.out(Tensor self, SymInt[2] output_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_nearest2d_out_cpu
+    CUDA: upsample_nearest2d_out_cuda
+    MPS: upsample_nearest2d_out_mps
+
+- func: _upsample_nearest_exact2d.out(Tensor self, SymInt[2] output_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: _upsample_nearest_exact2d_out_cpu
+    CUDA: _upsample_nearest_exact2d_out_cuda
+    MPS: _upsample_nearest_exact2d_out_mps
+
+- func: upsample_nearest2d(Tensor self, SymInt[2] output_size, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_nearest2d.out
+  dispatch:
+    QuantizedCPU: upsample_nearest2d_quantized_cpu
+
+- func: _upsample_nearest_exact2d(Tensor self, SymInt[2] output_size, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: _upsample_nearest_exact2d.out
+  dispatch:
+    QuantizedCPU: _upsample_nearest_exact2d_quantized_cpu
+
+- func: upsample_nearest2d_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_nearest2d_backward_out_cpu
+    CUDA: upsample_nearest2d_backward_out_cuda
+    MPS: upsample_nearest2d_backward_out_mps
+
+- func: _upsample_nearest_exact2d_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: _upsample_nearest_exact2d_backward_out_cpu
+    CUDA: _upsample_nearest_exact2d_backward_out_cuda
+    MPS: _upsample_nearest_exact2d_backward_out_mps
+
+- func: upsample_nearest2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_nearest2d_backward.grad_input
+
+- func: _upsample_nearest_exact2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: _upsample_nearest_exact2d_backward.grad_input
+
+- func: upsample_nearest3d.out(Tensor self, SymInt[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_nearest3d_out_cpu
+    CUDA: upsample_nearest3d_out_cuda
+    MPS: upsample_nearest3d_out_mps
+
+- func: _upsample_nearest_exact3d.out(Tensor self, SymInt[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: _upsample_nearest_exact3d_out_cpu
+    CUDA: _upsample_nearest_exact3d_out_cuda
+    MPS: _upsample_nearest_exact3d_out_mps
+
+- func: upsample_nearest3d(Tensor self, SymInt[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_nearest3d.out
+  dispatch:
+    QuantizedCPU: upsample_nearest3d_quantized_cpu
+
+- func: _upsample_nearest_exact3d(Tensor self, SymInt[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: _upsample_nearest_exact3d.out
+  dispatch:
+    QuantizedCPU: _upsample_nearest_exact3d_quantized_cpu
+
+- func: upsample_nearest3d_backward.grad_input(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: upsample_nearest3d_backward_out_cpu
+    CUDA: upsample_nearest3d_backward_out_cuda
+    MPS: upsample_nearest3d_backward_out_mps
+
+- func: _upsample_nearest_exact3d_backward.grad_input(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: _upsample_nearest_exact3d_backward_out_cpu
+    CUDA: _upsample_nearest_exact3d_backward_out_cuda
+    MPS: _upsample_nearest_exact3d_backward_out_mps
+
+- func: upsample_nearest3d_backward(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: upsample_nearest3d_backward.grad_input
+
+- func: _upsample_nearest_exact3d_backward(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  python_module: nn
+  structured_delegate: _upsample_nearest_exact3d_backward.grad_input
+
+- func: sigmoid_backward.grad_input(Tensor grad_output, Tensor output, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: sigmoid_backward_out
+    MPS: sigmoid_backward_out_mps
+  tags: pointwise
+
+- func: sigmoid_backward(Tensor grad_output, Tensor output) -> Tensor
+  python_module: nn
+  structured_delegate: sigmoid_backward.grad_input
+  tags: pointwise
+
+- func: logit_backward.grad_input(Tensor grad_output, Tensor self, float? eps=None, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: logit_backward_out
+    MPS: logit_backward_out_mps
+  tags: pointwise
+
+- func: logit_backward(Tensor grad_output, Tensor self, float? eps=None) -> Tensor
+  python_module: nn
+  structured_delegate: logit_backward.grad_input
+  tags: pointwise
+
+- func: tanh_backward.grad_input(Tensor grad_output, Tensor output, *, Tensor(a!) grad_input) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MTIA: tanh_backward_out
+    MPS: tanh_backward_out_mps
+  tags: pointwise
+
+- func: tanh_backward(Tensor grad_output, Tensor output) -> Tensor
+  python_module: nn
+  structured_delegate: tanh_backward.grad_input
+
+# What's a thnn_conv_ versus a slow_conv_?
+#
+# Historically, we have inefficient implementations of convolutions
+# coming from the THNN/THCUNN library.  These convolutions typically
+# operated by computing the Toeplitz matrix and then doing a matrix
+# multiply with the input; this is very memory inefficient!  However,
+# occasionally, we really don't have anything better, so it's helpful
+# to have these fallbacks when there is no more optimized implementation
+# in cudnn or mkldnn, etc.  Both thnn_ and slow_ convolutions fall
+# into this bucket.
+#
+# The difference between these two designations, is that thnn_ refers
+# to a convolution that is still written in the "legacy" style; that is,
+# C code in the THNN/ or THCUNN/ directory.  A slow_ convolution is
+# one that is written in the native style: modern C++.  Algorithmically,
+# these are the same thing, but we give them different prefixes to
+# make the operational distinction clear.
+  tags: pointwise
+
+- func: slow_conv_transpose2d.out(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias=None, SymInt[2] stride=1, SymInt[2] padding=0, SymInt[2] output_padding=0, SymInt[2] dilation=1, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  structured: True
+  dispatch:
+    CPU: slow_conv_transpose2d_structured_cpu
+    CUDA: slow_conv_transpose2d_structured_cuda
+
+- func: slow_conv_transpose2d(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias=None, SymInt[2] stride=1, SymInt[2] padding=0, SymInt[2] output_padding=0, SymInt[2] dilation=1) -> Tensor
+  python_module: nn
+  structured_delegate: slow_conv_transpose2d.out
+
+- func: slow_conv_transpose3d.out(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias=None, SymInt[3] stride=1, SymInt[3] padding=0, SymInt[3] output_padding=0, SymInt[3] dilation=1, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: slow_conv_transpose3d_out_cpu
+    CUDA: slow_conv_transpose3d_out_cuda
+
+- func: slow_conv_transpose3d(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias=None, SymInt[3] stride=1, SymInt[3] padding=0, SymInt[3] output_padding=0, SymInt[3] dilation=1) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: slow_conv_transpose3d_cpu
+    CUDA: slow_conv_transpose3d_cuda
+
+- func: thnn_conv2d.out(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias=None, SymInt[2] stride=1, SymInt[2] padding=0, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+
+- func: thnn_conv2d(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias=None, SymInt[2] stride=1, SymInt[2] padding=0) -> Tensor
+  python_module: nn
+
+- func: _slow_conv2d_forward.output(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias, SymInt[2] stride, SymInt[2] padding, *, Tensor(a!) output) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: slow_conv2d_forward_out_cpu
+    CUDA: slow_conv2d_forward_out_cuda
+
+- func: _slow_conv2d_forward(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias, SymInt[2] stride, SymInt[2] padding) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: slow_conv2d_forward_cpu
+    CUDA: slow_conv2d_forward_cuda
+
+- func: _slow_conv2d_backward.grad_input(Tensor grad_output, Tensor self, Tensor weight, SymInt[2] kernel_size, SymInt[2] stride, SymInt[2] padding, *, Tensor(a!) grad_input, Tensor(b!) grad_weight, Tensor(c!) grad_bias) -> (Tensor(a!), Tensor(b!), Tensor(c!))
+  python_module: nn
+  dispatch:
+    CPU: slow_conv2d_backward_out_cpu
+    CUDA: slow_conv2d_backward_out_cuda
+
+- func: _slow_conv2d_backward.output_mask(Tensor grad_output, Tensor self, Tensor weight, SymInt[2] kernel_size, SymInt[2] stride, SymInt[2] padding, bool[3] output_mask) -> (Tensor grad_input, Tensor grad_weight, Tensor grad_bias)
+  python_module: nn
+  dispatch:
+    CPU: slow_conv2d_backward_cpu
+    CUDA: slow_conv2d_backward_cuda
+  autogen: _slow_conv2d_backward.output_mask_out
+
+- func: _conv_depthwise2d.out(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias, SymInt[2] stride, SymInt[2] padding, SymInt[2] dilation, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CUDA: conv_depthwise2d_cuda_out
+
+- func: _conv_depthwise2d(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias, SymInt[2] stride, SymInt[2] padding, SymInt[2] dilation) -> Tensor
+  python_module: nn
+  dispatch:
+    CUDA: conv_depthwise2d_cuda
+
+- func: conv_depthwise3d(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias, SymInt[3] stride, SymInt[3] padding, SymInt[3] dilation) -> Tensor
+  python_module: nn
+  dispatch:
+    CUDA: conv_depthwise3d_cuda
+  autogen: conv_depthwise3d.out
+
+- func: slow_conv3d.out(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias=None, SymInt[3] stride=1, SymInt[3] padding=0, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+
+- func: slow_conv3d(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias=None, SymInt[3] stride=1, SymInt[3] padding=0) -> Tensor
+  python_module: nn
+
+- func: slow_conv3d_forward.output(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias, SymInt[3] stride, SymInt[3] padding, *, Tensor(a!) output) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: slow_conv3d_forward_out_cpu
+
+- func: slow_conv3d_forward(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias, SymInt[3] stride, SymInt[3] padding) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: slow_conv3d_forward_cpu
+
+- func: slow_conv_dilated2d(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias=None, SymInt[2] stride=1, SymInt[2] padding=0, SymInt[2] dilation=1) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: slow_conv_dilated2d_cpu
+    CUDA: slow_conv_dilated2d_cuda
+  autogen: slow_conv_dilated2d.out
+
+- func: slow_conv_dilated3d(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias=None, SymInt[3] stride=1, SymInt[3] padding=0, SymInt[3] dilation=1) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: slow_conv_dilated3d_cpu
+    CUDA: slow_conv_dilated3d_cuda
+  autogen: slow_conv_dilated3d.out
+
+- func: col2im.out(Tensor self, SymInt[2] output_size, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: col2im_out_cpu
+    CUDA: col2im_out_cuda
+    MPS: col2im_out_mps
+
+- func: col2im(Tensor self, SymInt[2] output_size, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: col2im_cpu
+    CUDA: col2im_cuda
+    MPS: col2im_mps
+  tags: core
+
+- func: column_stack(Tensor[] tensors) -> Tensor
+
+- func: column_stack.out(Tensor[] tensors, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: im2col.out(Tensor self, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: nn
+  dispatch:
+    CPU: im2col_out_cpu
+    CUDA: im2col_out_cuda
+    MPS: im2col_out_mps
+
+- func: im2col(Tensor self, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: im2col_cpu
+    CUDA: im2col_cuda
+    MPS: im2col_mps
+
+- func: isfinite(Tensor self) -> Tensor
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  tags: pointwise
+
+- func: isinf(Tensor self) -> Tensor
+  variants: function, method
+  device_check: NoCheck
+  device_guard: False
+  dispatch:
+    CompositeExplicitAutograd: isinf
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_isinf
+    SparseCPU, SparseCUDA, SparseMPS: isinf_sparse
+    SparseMeta: isinf_sparse_meta
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: isinf_sparse_csr
+  autogen: isinf.out
+  tags: [core, pointwise]
+
+- func: record_stream(Tensor(a!) self, Stream s) -> ()
+  variants: method
+  dispatch:
+    CUDA: record_stream_cuda
+
+- func: isposinf(Tensor self) -> Tensor
+  variants: function, method
+  structured_delegate: isposinf.out
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_isposinf
+    SparseCPU, SparseCUDA, SparseMPS: isposinf_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: isposinf_sparse_csr
+  tags: pointwise
+
+- func: isposinf.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: isposinf_out
+    SparseCPU, SparseCUDA, SparseMPS: isposinf_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: isposinf_sparse_csr_out
+  tags: pointwise
+
+- func: isneginf(Tensor self) -> Tensor
+  variants: function, method
+  structured_delegate: isneginf.out
+  dispatch:
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: NestedTensor_isneginf
+    SparseCPU, SparseCUDA, SparseMPS: isneginf_sparse
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: isneginf_sparse_csr
+  tags: pointwise
+
+- func: isneginf.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: isneginf_out
+    SparseCPU, SparseCUDA, SparseMPS: isneginf_sparse_out
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: isneginf_sparse_csr_out
+  tags: pointwise
+
+# NOTE [_add_batch_dim and _remove_batch_dim]
+# _add_batch_dim and _remove_batch_dim are meant to be used in the implementation
+# of the vmap frontend API (see torch/_vmap_internals.py). They are not
+# user-facing, hence the leading underscore. Please don't use them them anywhere else.
+- func: _add_batch_dim(Tensor self, int batch_dim, int level) -> Tensor
+  variants: function
+
+# See NOTE [_add_batch_dim and _remove_batch_dim]
+- func: _remove_batch_dim(Tensor self, int level, SymInt batch_size, int out_dim) -> Tensor
+  variants: function
+
+## Functions related to the `torch.special` namespace
+# Note [special namespace binding]
+# Functions in the special python module should have their names start with
+#   "special_" underscore and be bound to the desired Python name in
+#   torch/special/__init__.py, and the desired C++ name in torch/csrc/api/include/torch/special.h.
+#   The "special_" names should be hidden from the user and not documented.
+
+- func: special_entr(Tensor self) -> Tensor
+  structured_delegate: special_entr.out
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_entr.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: special
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS: special_entr_out
+  tags: pointwise
+
+- func: special_ndtri(Tensor self) -> Tensor
+  structured_delegate: special_ndtri.out
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_ndtri.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: special
+  variants: function
+  dispatch:
+    CPU, CUDA: special_ndtri_out
+  tags: pointwise
+
+- func: special_log_ndtr(Tensor self) -> Tensor
+  structured_delegate: special_log_ndtr.out
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_log_ndtr.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: special
+  variants: function
+  dispatch:
+    CPU, CUDA: special_log_ndtr_out
+  tags: pointwise
+
+- func: special_expm1(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_expm1.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_exp2(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_exp2.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_psi(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_psi.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_digamma(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_digamma.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_gammaln(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_gammaln.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_erf(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_erf.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_erfc(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_erfc.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+
+- func: special_erfcx(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+  structured_delegate: special_erfcx.out
+  tags: pointwise
+
+- func: special_erfcx.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: special_erfcx_out
+  tags: pointwise
+
+- func: special_erfinv(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_erfinv.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+
+- func: special_ndtr(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_ndtr.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_xlog1py(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+  structured_delegate: special_xlog1py.out
+  tags: pointwise
+
+- func: special_xlog1py.self_scalar(Scalar self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: special_xlog1py
+  tags: pointwise
+
+- func: special_xlog1py.other_scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: special_xlog1py
+  tags: pointwise
+
+- func: special_xlog1py.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: special
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS: special_xlog1py_out
+  tags: pointwise
+
+- func: special_xlog1py.self_scalar_out(Scalar self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: special_xlog1py_out
+  tags: pointwise
+
+- func: special_xlog1py.other_scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: special_xlog1py_out
+  tags: pointwise
+
+- func: special_xlogy(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+
+- func: special_xlogy.self_scalar(Scalar self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+
+- func: special_xlogy.other_scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+
+- func: special_xlogy.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+
+- func: special_xlogy.self_scalar_out(Scalar self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+
+- func: special_xlogy.other_scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+
+- func: special_zeta(Tensor self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+  structured_delegate: special_zeta.out
+  tags: pointwise
+
+- func: special_zeta.self_scalar(Scalar self, Tensor other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: special_zeta
+  tags: pointwise
+
+- func: special_zeta.other_scalar(Tensor self, Scalar other) -> Tensor
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: special_zeta
+  tags: pointwise
+
+- func: special_zeta.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  structured: True
+  structured_inherits: TensorIteratorBase
+  python_module: special
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS: special_zeta_out
+  tags: pointwise
+
+- func: special_zeta.self_scalar_out(Scalar self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: special_zeta_out
+  tags: pointwise
+
+- func: special_zeta.other_scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck   # TensorIterator
+  python_module: special
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: special_zeta_out
+  tags: pointwise
+
+- func: special_i0(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_i0.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_i0e(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+  structured_delegate: special_i0e.out
+  tags: pointwise
+
+- func: special_i0e.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: special_i0e_out
+  tags: pointwise
+
+- func: special_i1(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+  structured_delegate: special_i1.out
+  tags: pointwise
+
+- func: special_i1.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: special_i1_out
+  tags: pointwise
+
+- func: special_i1e(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+  structured_delegate: special_i1e.out
+  tags: pointwise
+
+- func: special_i1e.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA, MPS: special_i1e_out
+  tags: pointwise
+
+- func: special_logit(Tensor self, float? eps=None) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_logit.out(Tensor self, float? eps=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+
+- func: special_polygamma(int n, Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_polygamma.out(int n, Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+
+- func: special_logsumexp(Tensor self, int[1] dim, bool keepdim=False) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_logsumexp.out(Tensor self, int[1] dim, bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+
+- func: special_expit(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_expit.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_sinc(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_sinc.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_round(Tensor self, *, int decimals=0) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_round.out(Tensor self, *, int decimals=0, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_log1p(Tensor self) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_log1p.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_log_softmax(Tensor self, int dim, *, ScalarType? dtype=None) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_gammainc.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_gammainc(Tensor self, Tensor other) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_gammaincc.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_gammaincc(Tensor self, Tensor other) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_multigammaln(Tensor self, int p) -> Tensor
+  python_module: special
+  variants: function
+
+- func: special_multigammaln.out(Tensor self, int p, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: special
+  variants: function
+
+- func: special_softmax(Tensor self, int dim, ScalarType? dtype=None) -> Tensor
+  python_module: special
+  variants: function
+
+## Functions related to the fast Fourier transform and the torch.fft namespace
+# Note [FFT namespace binding]
+# Functions in the fft python module should have their names start with
+#   "fft_" underscore and be bound to the desired Python name in
+#   torch/fft/__init__.py, and the desired C++ name in torch/csrc/api/include/torch/fft.h.
+#   The "fft_" names should be hidden from the user and not documented.
+#
+# See fft_fft as an example.
+
+# torch.fft.fft
+# NOTE: NOT an alias for torch.fft, which has different semantics
+- func: fft_fft(Tensor self, SymInt? n=None, int dim=-1, str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_fft_symint
+
+- func: fft_fft.out(Tensor self, SymInt? n=None, int dim=-1, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_fft_symint_out
+
+- func: fft_ifft(Tensor self, SymInt? n=None, int dim=-1, str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ifft_symint
+
+- func: fft_ifft.out(Tensor self, SymInt? n=None, int dim=-1, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ifft_symint_out
+
+- func: fft_rfft(Tensor self, SymInt? n=None, int dim=-1, str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_rfft_symint
+
+- func: fft_rfft.out(Tensor self, SymInt? n=None, int dim=-1, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_rfft_symint_out
+
+- func: fft_irfft(Tensor self, SymInt? n=None, int dim=-1, str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_irfft_symint
+
+- func: fft_irfft.out(Tensor self, SymInt? n=None, int dim=-1, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_irfft_symint_out
+
+- func: fft_hfft(Tensor self, SymInt? n=None, int dim=-1, str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_hfft_symint
+
+- func: fft_hfft.out(Tensor self, SymInt? n=None, int dim=-1, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_hfft_symint_out
+
+- func: fft_ihfft(Tensor self, SymInt? n=None, int dim=-1, str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ihfft_symint
+
+- func: fft_ihfft.out(Tensor self, SymInt? n=None, int dim=-1, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ihfft_symint_out
+
+- func: fft_fft2(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_fft2_symint
+
+- func: fft_fft2.out(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_fft2_symint_out
+
+- func: fft_ifft2(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ifft2_symint
+
+- func: fft_ifft2.out(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ifft2_symint_out
+
+- func: fft_rfft2(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_rfft2_symint
+
+- func: fft_rfft2.out(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_rfft2_symint_out
+
+- func: fft_irfft2(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_irfft2_symint
+
+- func: fft_irfft2.out(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_irfft2_symint_out
+
+- func: fft_hfft2(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None) -> Tensor
+  use_const_ref_for_mutable_tensors: True
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_hfft2_symint
+
+- func: fft_hfft2.out(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_hfft2_symint_out
+
+- func: fft_ihfft2(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None) -> Tensor
+  use_const_ref_for_mutable_tensors: True
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ihfft2_symint
+
+- func: fft_ihfft2.out(Tensor self, SymInt[1]? s=None, int[1] dim=[-2,-1], str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ihfft2_symint_out
+
+- func: fft_fftn(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_fftn_symint
+
+- func: fft_fftn.out(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_fftn_symint_out
+
+- func: fft_ifftn(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ifftn_symint
+
+- func: fft_ifftn.out(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ifftn_symint_out
+
+- func: fft_rfftn(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_rfftn_symint
+
+- func: fft_rfftn.out(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_rfftn_symint_out
+
+- func: fft_irfftn(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_irfftn_symint
+
+- func: fft_irfftn.out(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_irfftn_symint_out
+
+- func: fft_hfftn(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None) -> Tensor
+  use_const_ref_for_mutable_tensors: True
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_hfftn_symint
+
+- func: fft_hfftn.out(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_hfftn_symint_out
+
+- func: fft_ihfftn(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None) -> Tensor
+  use_const_ref_for_mutable_tensors: True
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ihfftn_symint
+
+- func: fft_ihfftn.out(Tensor self, SymInt[1]? s=None, int[1]? dim=None, str? norm=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeImplicitAutograd: fft_ihfftn_symint_out
+
+- func: fft_fftfreq(int n, float d=1.0, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: fft_fftfreq
+
+- func: fft_fftfreq.out(int n, float d=1.0, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: fft_fftfreq_out
+
+- func: fft_rfftfreq(int n, float d=1.0, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: fft_rfftfreq
+
+- func: fft_rfftfreq.out(int n, float d=1.0, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: fft
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: fft_rfftfreq_out
+
+- func: fft_fftshift(Tensor self, int[1]? dim=None) -> Tensor
+  python_module: fft
+  variants: function
+
+- func: fft_ifftshift(Tensor self, int[1]? dim=None) -> Tensor
+  python_module: fft
+  variants: function
+
+## Functions for linear algebra and the torch.linalg namespace
+# Note [linalg namespace binding]
+# Functions in the linalg python module should have their names start with
+#   "linalg_" and be bound to the desired Python name in
+#   torch/linalg/__init__.py, and the desired C++ name in torch/csrc/api/include/torch/linalg.h.
+#   The "linalg_" names should be hidden from the user and not documented.
+#
+# See linalg_det as an example.
+
+# "_ex" stands for experimental
+- func: linalg_cholesky_ex(Tensor self, *, bool upper=False, bool check_errors=False) -> (Tensor L, Tensor info)
+  python_module: linalg
+  structured_delegate: linalg_cholesky_ex.L
+
+- func: linalg_cholesky_ex.L(Tensor self, *, bool upper=False, bool check_errors=False, Tensor(a!) L, Tensor(b!) info) -> (Tensor(a!) L, Tensor(b!) info)
+  python_module: linalg
+  structured: True
+  dispatch:
+    CPU, CUDA, MPS: linalg_cholesky_ex_out
+
+- func: linalg_cholesky(Tensor self, *, bool upper=False) -> Tensor
+  python_module: linalg
+
+- func: linalg_cholesky.out(Tensor self, *, bool upper=False, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+- func: linalg_cross(Tensor self, Tensor other, *, int dim=-1) -> Tensor
+  python_module: linalg
+  variants: function
+  structured_delegate: linalg_cross.out
+  dispatch:
+    ZeroTensor: linalg_cross_zerotensor
+
+- func: linalg_cross.out(Tensor self, Tensor other, *, int dim=-1, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  structured: True
+  dispatch:
+    CPU, CUDA, MPS: linalg_cross_out
+
+# linalg.lu_factor
+- func: linalg_lu_factor(Tensor A, *, bool pivot=True) -> (Tensor LU, Tensor pivots)
+  python_module: linalg
+  variants: function
+
+- func: linalg_lu_factor.out(Tensor A, *, bool pivot=True, Tensor(a!) LU, Tensor(b!) pivots) -> (Tensor(a!) LU, Tensor(b!) pivots)
+  python_module: linalg
+  variants: function
+
+- func: linalg_lu_factor_ex(Tensor A, *, bool pivot=True, bool check_errors=False) -> (Tensor LU, Tensor pivots, Tensor info)
+  python_module: linalg
+  structured_delegate: linalg_lu_factor_ex.out
+  variants: function
+
+- func: linalg_lu_factor_ex.out(Tensor A, *, bool pivot=True, bool check_errors=False, Tensor(a!) LU, Tensor(b!) pivots, Tensor(c!) info) -> (Tensor(a!) LU, Tensor(b!) pivots, Tensor(c!) info)
+  python_module: linalg
+  variants: function
+  structured: True
+  dispatch:
+    CPU, CUDA: linalg_lu_factor_ex_out
+    MPS: linalg_lu_factor_ex_out_mps
+
+# linalg.lu
+- func: linalg_lu(Tensor A, *, bool pivot=True) -> (Tensor P, Tensor L, Tensor U)
+  python_module: linalg
+  structured_delegate: linalg_lu.out
+  variants: function
+
+- func: linalg_lu.out(Tensor A, *, bool pivot=True, Tensor(a!) P, Tensor(b!) L, Tensor(c!) U) -> (Tensor(a!) P, Tensor(b!) L, Tensor(c!) U)
+  python_module: linalg
+  variants: function
+  structured: True
+  dispatch:
+    CPU, CUDA, MPS: linalg_lu_out
+
+# linalg.lu_solve
+- func: linalg_lu_solve(Tensor LU, Tensor pivots, Tensor B, *, bool left=True, bool adjoint=False) -> Tensor
+  python_module: linalg
+  structured_delegate: linalg_lu_solve.out
+  variants: function
+
+- func: linalg_lu_solve.out(Tensor LU, Tensor pivots, Tensor B, *, bool left=True, bool adjoint=False, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+  structured: True
+  dispatch:
+    CPU, CUDA: linalg_lu_solve_out
+
+# linalg.det
+- func: _linalg_det(Tensor A) -> (Tensor result, Tensor LU, Tensor pivots)
+  structured_delegate: _linalg_det.result
+
+- func: _linalg_det.result(Tensor A, *, Tensor(a!) result, Tensor(b!) LU, Tensor(c!) pivots) -> (Tensor(a!) result, Tensor(b!) LU, Tensor(c!) pivots)
+  structured: True
+  dispatch:
+    CPU, CUDA, MPS: _linalg_det_out
+
+- func: linalg_det(Tensor A) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_det.out(Tensor A, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+# torch.det, alias for torch.linalg.det
+- func: det(Tensor self) -> Tensor
+  variants: function, method
+
+- func: linalg_ldl_factor_ex(Tensor self, *, bool hermitian=False, bool check_errors=False) -> (Tensor LD, Tensor pivots, Tensor info)
+  structured_delegate: linalg_ldl_factor_ex.out
+  python_module: linalg
+  variants: function
+
+- func: linalg_ldl_factor_ex.out(Tensor self, *, bool hermitian=False, bool check_errors=False, Tensor(a!) LD, Tensor(b!) pivots, Tensor(c!) info) -> (Tensor(a!) LD, Tensor(b!) pivots, Tensor(c!) info)
+  structured: True
+  python_module: linalg
+  variants: function
+  dispatch:
+    CPU, CUDA: linalg_ldl_factor_ex_out
+
+- func: linalg_ldl_factor(Tensor self, *, bool hermitian=False) -> (Tensor LD, Tensor pivots)
+  python_module: linalg
+  variants: function
+
+- func: linalg_ldl_factor.out(Tensor self, *, bool hermitian=False, Tensor(a!) LD, Tensor(b!) pivots) -> (Tensor(a!) LD, Tensor(b!) pivots)
+  python_module: linalg
+  variants: function
+
+- func: linalg_ldl_solve(Tensor LD, Tensor pivots, Tensor B, *, bool hermitian=False) -> Tensor
+  structured_delegate: linalg_ldl_solve.out
+  python_module: linalg
+  variants: function
+
+- func: linalg_ldl_solve.out(Tensor LD, Tensor pivots, Tensor B, *, bool hermitian=False, Tensor(a!) out) -> Tensor(a!)
+  structured: True
+  python_module: linalg
+  variants: function
+  dispatch:
+    CPU, CUDA: linalg_ldl_solve_out
+
+- func: linalg_lstsq(Tensor self, Tensor b, float? rcond=None, *, str? driver=None) -> (Tensor solution, Tensor residuals, Tensor rank, Tensor singular_values)
+  python_module: linalg
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: linalg_lstsq
+  tags: dynamic_output_shape
+
+- func: linalg_lstsq.out(Tensor self, Tensor b, float? rcond=None, *, str? driver=None, Tensor(a!) solution, Tensor(b!) residuals, Tensor(c!) rank, Tensor(d!) singular_values) -> (Tensor(a!) solution, Tensor(b!) residuals, Tensor(c!) rank, Tensor(d!) singular_values)
+  python_module: linalg
+  variants: function
+  dispatch:
+    CPU, CUDA: linalg_lstsq_out
+  tags: dynamic_output_shape
+
+# torch.linalg.matmul, alias for torch.matmul
+- func: linalg_matmul(Tensor self, Tensor other) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_matmul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+- func: linalg_vecdot(Tensor x, Tensor y, *, int dim=-1) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_vecdot.out(Tensor x, Tensor y, *, int dim=-1, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+- func: linalg_matrix_exp(Tensor self) -> Tensor
+  python_module: linalg
+  variants: function
+  dispatch:
+    CPU, CUDA: linalg_matrix_exp
+  autogen: linalg_matrix_exp.out
+
+- func: _linalg_slogdet(Tensor A) -> (Tensor sign, Tensor logabsdet, Tensor LU, Tensor pivots)
+  structured_delegate: _linalg_slogdet.sign
+
+- func: _linalg_slogdet.sign(Tensor A, *, Tensor(a!) sign, Tensor(b!) logabsdet, Tensor(c!) LU, Tensor(d!) pivots) -> (Tensor(a!) sign, Tensor(b!) logabsdet, Tensor(c!) LU, Tensor(d!) pivots)
+  structured: True
+  dispatch:
+    CPU, CUDA, MPS: _linalg_slogdet_out
+
+- func: linalg_slogdet(Tensor A) -> (Tensor sign, Tensor logabsdet)
+  python_module: linalg
+
+- func: linalg_slogdet.out(Tensor A, *, Tensor(a!) sign, Tensor(b!) logabsdet) -> (Tensor(a!) sign, Tensor(b!) logabsdet)
+  python_module: linalg
+
+- func: slogdet(Tensor self) -> (Tensor sign, Tensor logabsdet)
+  variants: function, method
+
+- func: slogdet.out(Tensor self, *, Tensor(a!) sign, Tensor(b!) logabsdet) -> (Tensor(a!) sign, Tensor(b!) logabsdet)
+  variants: function
+
+- func: logdet(Tensor self) -> Tensor
+  variants: function, method
+
+- func: linalg_eig(Tensor self) -> (Tensor eigenvalues, Tensor eigenvectors)
+  python_module: linalg
+  variants: function
+  dispatch:
+    CPU, CUDA: linalg_eig
+
+- func: linalg_eig.out(Tensor self, *, Tensor(a!) eigenvalues, Tensor(b!) eigenvectors) -> (Tensor(a!) eigenvalues, Tensor(b!) eigenvectors)
+  python_module: linalg
+  dispatch:
+    CPU, CUDA: linalg_eig_out
+
+- func: _linalg_eigvals(Tensor self) -> Tensor
+  python_module: linalg
+  dispatch:
+    CPU, CUDA: _linalg_eigvals
+
+- func: linalg_eigvals(Tensor self) -> Tensor
+  python_module: linalg
+
+- func: linalg_eigvals.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  dispatch:
+    CPU, CUDA: linalg_eigvals_out
+
+# This function is exposes the `compute_v` flag, which is then used to implement `linalg.eigh` and
+# `linalg.eigvalsh` as composite functions that call this one
+- func: _linalg_eigh(Tensor A, str UPLO="L", bool compute_v=True) -> (Tensor eigenvalues, Tensor eigenvectors)
+  structured_delegate: _linalg_eigh.eigenvalues
+
+- func: _linalg_eigh.eigenvalues(Tensor A, str UPLO="L", bool compute_v=True, *, Tensor(a!) eigenvalues, Tensor(b!) eigenvectors) -> (Tensor(a!) eigenvalues, Tensor(b!) eigenvectors)
+  structured: True
+  dispatch:
+    CPU, CUDA: _linalg_eigh_out
+
+- func: linalg_eigh(Tensor self, str UPLO="L") -> (Tensor eigenvalues, Tensor eigenvectors)
+  python_module: linalg
+
+- func: linalg_eigh.eigvals(Tensor self, str UPLO="L", *, Tensor(a!) eigvals, Tensor(b!) eigvecs) -> (Tensor(a!) eigenvalues, Tensor(b!) eigenvectors)
+  python_module: linalg
+
+- func: linalg_eigvalsh(Tensor self, str UPLO="L") -> Tensor
+  python_module: linalg
+
+- func: linalg_eigvalsh.out(Tensor self, str UPLO="L", *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+- func: linalg_householder_product(Tensor input, Tensor tau) -> Tensor
+  python_module: linalg
+  variants: function
+  dispatch:
+    CPU, CUDA, MPS: linalg_householder_product
+
+- func: linalg_householder_product.out(Tensor input, Tensor tau, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  dispatch:
+    CPU, CUDA, MPS: linalg_householder_product_out
+
+- func: linalg_inv_ex(Tensor A, *, bool check_errors=False) -> (Tensor inverse, Tensor info)
+  python_module: linalg
+  structured_delegate: linalg_inv_ex.inverse
+
+- func: linalg_inv_ex.inverse(Tensor A, *, bool check_errors=False, Tensor(a!) inverse, Tensor(b!) info) -> (Tensor(a!) inverse, Tensor(b!) info)
+  python_module: linalg
+  structured: True
+  dispatch:
+    CPU, CUDA: linalg_inv_ex_out
+    MPS: linalg_inv_ex_out_mps
+
+- func: linalg_inv(Tensor A) -> Tensor
+  python_module: linalg
+
+- func: linalg_inv.out(Tensor A, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+- func: inverse(Tensor self) -> Tensor
+  variants: function, method
+
+- func: inverse.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: inner(Tensor self, Tensor other) -> Tensor
+  variants: function, method
+
+- func: inner.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: outer(Tensor self, Tensor vec2) -> Tensor
+  variants: function, method
+
+- func: outer.out(Tensor self, Tensor vec2, *, Tensor(a!) out) -> Tensor(a!)
+
+# torch.ger, alias for torch.outer
+- func: ger(Tensor self, Tensor vec2) -> Tensor
+  variants: function, method
+
+- func: ger.out(Tensor self, Tensor vec2, *, Tensor(a!) out) -> Tensor(a!)
+
+- func: linalg_norm(Tensor self, Scalar? ord=None, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_norm.ord_str(Tensor self, str ord, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_norm.out(Tensor self, Scalar? ord=None, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_norm.ord_str_out(Tensor self, str ord, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_vector_norm(Tensor self, Scalar ord=2, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  python_module: linalg
+  variants: function
+  structured_delegate: linalg_vector_norm.out
+  tags: reduction
+
+- func: linalg_vector_norm.out(Tensor self, Scalar ord=2, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  structured: True
+  dispatch:
+    CPU, CUDA: linalg_vector_norm_out
+    MPS: linalg_vector_norm_out_mps
+  tags: reduction
+
+- func: linalg_matrix_norm(Tensor self, Scalar ord, int[] dim=[-2,-1], bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  python_module: linalg
+
+- func: linalg_matrix_norm.out(Tensor self, Scalar ord, int[] dim=[-2,-1], bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+- func: linalg_matrix_norm.str_ord(Tensor self, str ord='fro', int[] dim=[-2,-1], bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  python_module: linalg
+
+- func: linalg_matrix_norm.str_ord_out(Tensor self, str ord='fro', int[] dim=[-2,-1], bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+# This function is exposes the `compute_uv` flag, which is then used to implement `linalg.svd` and
+# `linalg.svdvals` as composite functions that call this one
+- func: _linalg_svd(Tensor A, bool full_matrices=False, bool compute_uv=True, *, str? driver=None) -> (Tensor U, Tensor S, Tensor Vh)
+  variants: function
+  structured_delegate: _linalg_svd.U
+
+- func: _linalg_svd.U(Tensor A, bool full_matrices=False, bool compute_uv=True, *, str? driver=None, Tensor(a!) U, Tensor(b!) S, Tensor(c!) Vh) -> (Tensor(a!) U, Tensor(b!) S, Tensor(c!) Vh)
+  structured: True
+  dispatch:
+    CPU, CUDA: _linalg_svd_out
+
+- func: linalg_svd(Tensor A, bool full_matrices=True, *, str? driver=None) -> (Tensor U, Tensor S, Tensor Vh)
+  python_module: linalg
+  variants: function
+
+- func: linalg_svd.U(Tensor A, bool full_matrices=True, *, str? driver=None, Tensor(a!) U, Tensor(b!) S, Tensor(c!) Vh) -> (Tensor(a!) U, Tensor(b!) S, Tensor(c!) Vh)
+  python_module: linalg
+  variants: function
+
+- func: linalg_svdvals(Tensor A, *, str? driver=None) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_svdvals.out(Tensor A, *, str? driver=None, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_cond(Tensor self, Scalar? p=None) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_cond.out(Tensor self, Scalar? p=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_cond.p_str(Tensor self, str p) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_cond.p_str_out(Tensor self, str p, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_pinv.atol_rtol_tensor(Tensor self, *, Tensor? atol=None, Tensor? rtol=None, bool hermitian=False) -> Tensor
+  python_module: linalg
+  variants: function
+  dispatch:
+    # calls svd, which calls mH() (view op)
+    # also calls narrow()
+    CompositeExplicitAutogradNonFunctional: linalg_pinv
+
+- func: linalg_pinv.atol_rtol_tensor_out(Tensor self, *, Tensor? atol=None, Tensor? rtol=None, bool hermitian=False, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: linalg_pinv_out
+
+- func: linalg_pinv.atol_rtol_float(Tensor self, *, float? atol=None, float? rtol=None, bool hermitian=False) -> Tensor
+  cpp_no_default_args: ['atol', 'rtol']
+  python_module: linalg
+  variants: function
+
+- func: linalg_pinv.atol_rtol_float_out(Tensor self, *, float? atol=None, float? rtol=None, bool hermitian=False, Tensor(a!) out) -> Tensor(a!)
+  cpp_no_default_args: ['atol', 'rtol']
+  python_module: linalg
+  variants: function
+
+- func: linalg_pinv(Tensor self, float rcond, bool hermitian=False) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_pinv.rcond_tensor(Tensor self, Tensor rcond, bool hermitian=False) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_pinv.out(Tensor self, float rcond, bool hermitian=False, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_pinv.out_rcond_tensor(Tensor self, Tensor rcond, bool hermitian=False, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: _linalg_solve_ex(Tensor A, Tensor B, *, bool left=True, bool check_errors=False) -> (Tensor result, Tensor LU, Tensor pivots, Tensor info)
+  structured_delegate: _linalg_solve_ex.result
+
+- func: _linalg_solve_ex.result(Tensor A, Tensor B, *, bool left=True, bool check_errors=False, Tensor(a!) result, Tensor(b!) LU, Tensor(c!) pivots, Tensor(d!) info) -> (Tensor(a!) result, Tensor(b!) LU, Tensor(c!) pivots, Tensor(d!) info)
+  structured: True
+  dispatch:
+    CPU, CUDA: _linalg_solve_ex_out
+    MPS: _linalg_solve_ex_out_mps
+
+- func: linalg_solve_ex(Tensor A, Tensor B, *, bool left=True, bool check_errors=False) -> (Tensor result, Tensor info)
+  python_module: linalg
+
+- func: linalg_solve_ex.out(Tensor A, Tensor B, *, bool left=True, bool check_errors=False, Tensor(a!) result, Tensor(b!) info) -> (Tensor(a!) result, Tensor(b!) info)
+  python_module: linalg
+
+- func: linalg_solve(Tensor A, Tensor B, *, bool left=True) -> Tensor
+  python_module: linalg
+
+- func: _spsolve(Tensor A, Tensor B, *, bool left=True) -> Tensor
+  python_module: sparse
+  dispatch:
+    SparseCsrCUDA: _sparse_csr_linear_solve
+
+- func: linalg_solve.out(Tensor A, Tensor B, *, bool left=True, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+- func: linalg_tensorinv(Tensor self, int ind=2) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_tensorinv.out(Tensor self, int ind=2, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_tensorsolve(Tensor self, Tensor other, int[]? dims=None) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_tensorsolve.out(Tensor self, Tensor other, int[]? dims=None, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_qr(Tensor A, str mode='reduced') -> (Tensor Q, Tensor R)
+  python_module: linalg
+  variants: function
+  structured_delegate: linalg_qr.out
+
+- func: linalg_qr.out(Tensor A, str mode='reduced', *, Tensor(a!) Q, Tensor(b!) R) -> (Tensor(a!) Q, Tensor(b!) R)
+  python_module: linalg
+  structured: True
+  dispatch:
+    CPU, CUDA: linalg_qr_out
+
+- func: linalg_matrix_power(Tensor self, int n) -> Tensor
+  python_module: linalg
+
+- func: linalg_matrix_power.out(Tensor self, int n, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+- func: linalg_matrix_rank.atol_rtol_tensor(Tensor input, *, Tensor? atol=None, Tensor? rtol=None, bool hermitian=False) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_matrix_rank.atol_rtol_tensor_out(Tensor input, *, Tensor? atol=None, Tensor? rtol=None, bool hermitian=False, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_matrix_rank.atol_rtol_float(Tensor self, *, float? atol=None, float? rtol=None, bool hermitian=False) -> Tensor
+  cpp_no_default_args: ['atol', 'rtol']
+  python_module: linalg
+  variants: function
+
+- func: linalg_matrix_rank.atol_rtol_float_out(Tensor self, *, float? atol=None, float? rtol=None, bool hermitian=False, Tensor(a!) out) -> Tensor(a!)
+  cpp_no_default_args: ['atol', 'rtol']
+  python_module: linalg
+  variants: function
+
+- func: linalg_matrix_rank(Tensor self, float tol, bool hermitian=False) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_matrix_rank.out(Tensor self, float tol, bool hermitian=False, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_matrix_rank.tol_tensor(Tensor input, Tensor tol, bool hermitian=False) -> Tensor
+  python_module: linalg
+  variants: function
+
+- func: linalg_matrix_rank.out_tol_tensor(Tensor input, Tensor tol, bool hermitian=False, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+  variants: function
+
+- func: linalg_multi_dot(Tensor[] tensors) -> Tensor
+  python_module: linalg
+
+- func: linalg_multi_dot.out(Tensor[] tensors, *, Tensor(a!) out) -> Tensor(a!)
+  python_module: linalg
+
+## Functions related to the `torch.nested` namespace
+# Note [nested namespace binding]
+# Functions in the nested python module should have their names start with
+#   "nested_" underscore and be bound to the desired Python name in
+#   torch/nested/__init__.py, and the desired C++ name in torch/csrc/api/include/torch/nested.h.
+#   The "nested_" names should be hidden from the user and not documented.
+
+- func: nested_to_padded_tensor(Tensor self, float padding, int[]? output_size=None) -> Tensor
+  python_module: nested
+  variants: function
+
+## Functions that are only for testing
+# It is undocumented and should not be used outside of tests.
+- func: _test_serialization_subcmul(Tensor self, Tensor other, Scalar alpha=1) -> Tensor
+
+# Note: for testing COW materialization within `at::parallel_for` loop function
+- func: _test_parallel_materialize(Tensor self, int num_parallel, bool skip_first=False) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _test_parallel_materialize
+
+# Note: this function is only for testing.
+- func: _test_optional_intlist(Tensor values, int[]? addends) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: _test_optional_intlist
+  autogen: _test_optional_intlist.out
+
+# Note: this function is only for testing.
+- func: _test_optional_filled_intlist(Tensor values, int[2]? addends) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: _test_optional_intlist
+  autogen: _test_optional_filled_intlist.out
+
+# Note: this function is only for testing.
+- func: _test_optional_floatlist(Tensor values, float[]? addends) -> Tensor
+  python_module: nn
+  dispatch:
+    CPU: _test_optional_floatlist
+  autogen: _test_optional_floatlist.out
+
+# Note: this function is only for testing.
+- func: _test_string_default(Tensor dummy, str a="\"'\\", str b='"\'\\') -> Tensor
+  python_module: nn
+
+# Note: this function is only for testing.
+- func: _test_ambiguous_defaults.a(Tensor dummy, int a=1, int b=1) -> Tensor
+  python_module: nn
+
+# Note: this function is only for testing.
+- func: _test_ambiguous_defaults.b(Tensor dummy, int a=2, str b="2") -> Tensor
+  cpp_no_default_args: ['a', 'b']
+  python_module: nn
+
+# Note: this function is only for testing.
+- func: _test_warn_in_autograd(Tensor self) -> Tensor
+  python_module: nn
+  dispatch:
+    CompositeExplicitAutograd: _test_warn_in_autograd
+  autogen: _test_warn_in_autograd.out
+
+# Note: this function is only for testing.
+- func: _test_autograd_multiple_dispatch.fullcoverage(Tensor self) -> Tensor
+  dispatch:
+    # the NestedTensor keys are necessary because NestedTensor has been removed
+    # from the CompositeExplicitAutograd keyset see Note [NestedTensor Not Included in Backend Keys]
+    CompositeExplicitAutograd, NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: _test_autograd_multiple_dispatch_fullcoverage
+  autogen: _test_autograd_multiple_dispatch.fullcoverage_out
+
+# Note: this function is only for testing.
+- func: _test_autograd_multiple_dispatch.ntonly(Tensor self, bool b) -> Tensor
+  dispatch:
+    CompositeImplicitAutograd, NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: _test_autograd_multiple_dispatch_ntonly
+
+# Note: this function is only for testing.
+- func: _test_autograd_multiple_dispatch_view(Tensor(a) self) -> Tensor(a)
+  dispatch:
+    CompositeExplicitAutograd: _test_autograd_multiple_dispatch_view
+
+# Note: this function is only for testing.
+- func: _test_autograd_multiple_dispatch_view_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _test_autograd_multiple_dispatch_view_copy
+  tags: view_copy
+  autogen: _test_autograd_multiple_dispatch_view_copy.out
+
+- func: segment_reduce(Tensor data, str reduce, *, Tensor? lengths=None, Tensor? indices=None, Tensor? offsets=None, int axis=0, bool unsafe=False, Scalar? initial=None) -> Tensor
+  variants: function
+  dispatch:
+    CPU, CUDA: segment_reduce_kernel
+  autogen: segment_reduce.out
+
+- func: _segment_reduce_backward(Tensor grad, Tensor output, Tensor data, str reduce, *, Tensor? lengths=None, Tensor? offsets=None, int axis=0, Scalar? initial=None) -> Tensor
+  variants: function
+  dispatch:
+    CPU, CUDA: _segment_reduce_backward_kernel
+  autogen: _segment_reduce_backward.out
+
+- func: pad_sequence(Tensor[] sequences, bool batch_first=False, float padding_value=0.0, str padding_side="right") -> Tensor
+  python_module: nn
+  variants: function
+
+- func: flatten_dense_tensors(Tensor[] tensors) -> Tensor
+  variants: function
+  python_module: nn
+
+- func: unflatten_dense_tensors(Tensor flat, Tensor[] tensors) -> Tensor[]
+  variants: function
+  python_module: nn
+
+- func: _nested_tensor_from_tensor_list(Tensor[] list, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _nested_tensor_from_tensor_list
+  autogen: _nested_tensor_from_tensor_list.out
+
+- func: _fw_primal_copy(Tensor self, int level) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _fw_primal_copy
+  tags: view_copy
+  autogen: _fw_primal_copy.out
+
+- func: _make_dual_copy(Tensor primal, Tensor tangent, int level) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _make_dual_copy
+  tags: view_copy
+  autogen: _make_dual_copy.out
+
+- func: view_as_real_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: view_as_real_copy
+  tags: view_copy
+  autogen: view_as_real_copy.out
+
+- func: view_as_complex_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: view_as_complex_copy
+  tags: view_copy
+  autogen: view_as_complex_copy.out
+
+- func: _conj_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _conj_copy
+  tags: view_copy
+  autogen: _conj_copy.out
+
+- func: _neg_view_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _neg_view_copy
+  tags: view_copy
+  autogen: _neg_view_copy.out
+
+- func: as_strided_copy(Tensor self, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: as_strided_copy_symint
+  tags: view_copy
+  autogen: as_strided_copy.out
+
+- func: _sparse_broadcast_to_copy(Tensor self, int[] size) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _sparse_broadcast_to_copy
+  tags: view_copy
+  autogen: _sparse_broadcast_to_copy.out
+
+- func: diagonal_copy(Tensor self, int offset=0, int dim1=0, int dim2=1) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: diagonal_copy
+  tags: view_copy
+  autogen: diagonal_copy.out
+
+- func: expand_copy(Tensor self, SymInt[] size, *, bool implicit=False) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: expand_copy_symint
+  tags: view_copy
+  autogen: expand_copy.out
+
+- func: permute_copy(Tensor self, int[] dims) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: permute_copy
+  tags: view_copy
+  autogen: permute_copy.out
+
+- func: _reshape_alias_copy(Tensor self, SymInt[] size, SymInt[] stride) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _reshape_alias_copy_symint
+  tags: view_copy
+  autogen: _reshape_alias_copy.out
+
+- func: select_copy.int(Tensor self, int dim, SymInt index) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: select_copy_symint
+    SparseCsrCPU, SparseCsrCUDA, SparseCsrMeta: select_copy_sparse_csr
+  tags: view_copy
+  autogen: select_copy.int_out
+
+- func: detach_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: detach_copy
+  tags: view_copy
+  autogen: detach_copy.out
+
+- func: slice_copy.Tensor(Tensor self, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: slice_copy_Tensor_symint
+  tags: view_copy
+  autogen: slice_copy.Tensor_out
+
+- func: split_copy.Tensor(Tensor self, SymInt split_size, int dim=0) -> Tensor[]
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: split_copy_Tensor_symint
+  tags: view_copy
+
+- func: split_with_sizes_copy(Tensor self, SymInt[] split_sizes, int dim=0) -> Tensor[]
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: split_with_sizes_copy_symint
+  tags: view_copy
+
+- func: squeeze_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: squeeze_copy
+  tags: view_copy
+  autogen: squeeze_copy.out
+
+- func: squeeze_copy.dim(Tensor self, int dim) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: squeeze_copy_dim
+  tags: view_copy
+  autogen: squeeze_copy.dim_out
+
+- func: squeeze_copy.dims(Tensor self, int[] dim) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: squeeze_copy_dims
+  tags: view_copy
+  autogen: squeeze_copy.dims_out
+
+- func: t_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: t_copy
+  tags: view_copy
+  autogen: t_copy.out
+
+- func: transpose_copy.int(Tensor self, int dim0, int dim1) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: transpose_copy_int
+  tags: view_copy
+  autogen: transpose_copy.int_out
+
+- func: unsqueeze_copy(Tensor self, int dim) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: unsqueeze_copy
+  tags: view_copy
+  autogen: unsqueeze_copy.out
+
+- func: _indices_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _indices_copy
+  tags: view_copy
+  autogen: _indices_copy.out
+
+- func: _values_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: _values_copy
+  tags: view_copy
+  autogen: _values_copy.out
+
+- func: indices_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: indices_copy
+  tags: view_copy
+  autogen: indices_copy.out
+
+- func: values_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: values_copy
+  tags: view_copy
+  autogen: values_copy.out
+
+- func: crow_indices_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: crow_indices_copy
+  tags: view_copy
+  autogen: crow_indices_copy.out
+
+- func: col_indices_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: col_indices_copy
+  tags: view_copy
+  autogen: col_indices_copy.out
+
+- func: ccol_indices_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: ccol_indices_copy
+  tags: view_copy
+  autogen: ccol_indices_copy.out
+
+- func: row_indices_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: row_indices_copy
+  tags: view_copy
+  autogen: row_indices_copy.out
+
+- func: unbind_copy.int(Tensor self, int dim=0) -> Tensor[]
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: unbind_copy_int
+  tags: view_copy
+
+- func: unbind_copy.int_out(Tensor self, int dim=0, *, Tensor(a!)[] out) -> ()
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: unbind_copy_int_out
+
+- func: split_copy.Tensor_out(Tensor self, SymInt split_size, int dim=0, *, Tensor(a!)[] out) -> ()
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: split_copy_Tensor_out
+
+
+- func: split_with_sizes_copy.out(Tensor self, SymInt[] split_sizes, int dim=0, *, Tensor(a!)[] out) -> ()
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: split_with_sizes_copy_out
+    CUDA: split_with_sizes_copy_out_cuda
+
+- func: view_copy(Tensor self, SymInt[] size) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: view_copy_symint
+  tags: view_copy
+  autogen: view_copy.out
+
+- func: view_copy.dtype(Tensor self, ScalarType dtype) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: view_copy_dtype
+  tags: view_copy
+  autogen: view_copy.dtype_out
+
+- func: unfold_copy(Tensor self, int dimension, int size, int step) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: unfold_copy
+  tags: view_copy
+  autogen: unfold_copy.out
+
+- func: alias_copy(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutogradNonFunctional: alias_copy
+  tags: view_copy
+  autogen: alias_copy.out
+
+- func: to_padded_tensor(Tensor self, float padding, SymInt[]? output_size=None) -> Tensor
+  variants: method
+  dispatch:
+    NestedTensorCPU: NestedTensor_to_padded_tensor_generic
+    NestedTensorCUDA: NestedTensor_to_padded_tensor_cuda
+  autogen: to_padded_tensor.out
+
+- func: _jagged_to_padded_dense_forward(Tensor values, Tensor[] offsets, SymInt[] max_lengths, float padding_value=0.0) -> Tensor
+  variants: function
+  dispatch:
+    CUDA: _fbgemm_jagged_to_padded_dense_forward
+    CPU: _jagged_to_padded_dense_forward_cpu
+
+- func: _padded_dense_to_jagged_forward(Tensor dense, Tensor[] offsets, SymInt? total_L=None) -> Tensor
+  variants: function
+  dispatch:
+    CUDA: _fbgemm_dense_to_jagged_forward_symint
+    CPU: _padded_dense_to_jagged_forward_cpu
+
+- func: _nested_from_padded_tensor(Tensor padded, Tensor offsets, Tensor dummy, int ragged_idx=1, Tensor? min_seqlen=None, Tensor? max_seqlen=None, SymInt? sum_S=None) -> Tensor
+  variants: function
+  device_check: NoCheck
+  dispatch: {}
+
+- func: _nested_tensor_softmax_with_shape(Tensor self, Tensor query) -> Tensor
+  dispatch:
+    NestedTensorCPU: NestedTensor_softmax_dropout
+    NestedTensorCUDA: NestedTensor_softmax_dropout_cuda
+  tags: nondeterministic_seeded
+
+- func: _safe_softmax(Tensor self, int dim, ScalarType? dtype=None) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: _safe_softmax
+    NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: _safe_softmax
+
+# Apparently, putting "forward" in the name will cause Python bindings to be skipped, so "fwd" it is.
+- func: _transformer_encoder_layer_fwd(Tensor src, int embed_dim, int num_heads, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, bool use_gelu, bool norm_first, float eps, Tensor norm_weight_1, Tensor norm_bias_1, Tensor norm_weight_2, Tensor norm_bias_2, Tensor ffn_weight_1, Tensor ffn_bias_1, Tensor ffn_weight_2, Tensor ffn_bias_2, Tensor? mask=None, int? mask_type=None) -> Tensor
+  variants: function
+  dispatch:
+    CPU, CUDA, NestedTensorCPU, NestedTensorHPU, NestedTensorCUDA: transformer_encoder_layer_forward
+  autogen: _transformer_encoder_layer_fwd.out
+
+- func: _native_multi_head_attention(Tensor query, Tensor key, Tensor value, int embed_dim, int num_head, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, Tensor? mask=None, bool need_weights=True, bool average_attn_weights=True, int? mask_type=None) -> (Tensor, Tensor)
+  variants: function
+  dispatch:
+    CPU, NestedTensorCPU: native_multi_head_attention_cpu
+    CUDA, NestedTensorCUDA: native_multi_head_attention_cuda
+  autogen: _native_multi_head_attention.out
+
+- func: scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool is_causal=False, *, float? scale=None, bool enable_gqa=False) -> Tensor
+  python_module: nn
+  variants: function
+  autogen: scaled_dot_product_attention.out
+  tags: nondeterministic_seeded
+
+# This aten function is kept so that we can test the choice function from Python
+- func: _fused_sdp_choice(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool is_causal=False, *, float? scale=None, bool enable_gqa=False) -> int
+  dispatch:
+    Meta: _fused_sdp_choice_meta
+    CPU, NestedTensorCPU: _fused_sdp_choice_cpp
+    CUDA, NestedTensorCUDA: _fused_sdp_choice_cuda
+    XPU: _fused_sdp_choice_xpu
+  tags: nondeterministic_seeded
+
+- func: _scaled_dot_product_attention_math(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool is_causal=False, Tensor? dropout_mask=None, *, float? scale=None, bool enable_gqa=False) -> (Tensor, Tensor)
+  variants: function
+  tags: nondeterministic_seeded
+
+- func: _scaled_dot_product_attention_math_for_mps(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool is_causal=False, Tensor? dropout_mask=None, *, float? scale=None) -> (Tensor, Tensor)
+  dispatch:
+    MPS: _scaled_dot_product_attention_math_mps
+  tags: nondeterministic_seeded
+
+- func: _scaled_dot_product_flash_attention(Tensor query, Tensor key, Tensor value, float dropout_p=0.0, bool is_causal=False, bool return_debug_mask=False, *, float? scale=None) -> (Tensor output, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, Tensor rng_state, Tensor unused, Tensor debug_attn_mask)
+  dispatch:
+    CUDA: _scaled_dot_product_flash_attention_cuda
+    XPU: _scaled_dot_product_flash_attention_xpu
+    NestedTensorCUDA: _scaled_dot_product_flash_attention_nestedtensor_cuda
+  tags: nondeterministic_seeded
+
+- func: _scaled_dot_product_flash_attention_for_cpu(Tensor query, Tensor key, Tensor value, float dropout_p=0.0, bool is_causal=False, *, Tensor? attn_mask=None, float? scale=None) -> (Tensor output, Tensor logsumexp)
+  dispatch:
+    CPU: _scaled_dot_product_flash_attention_cpu
+  tags: nondeterministic_seeded
+
+- func: _scaled_dot_product_fused_attention_overrideable(Tensor query, Tensor key, Tensor value, Tensor? attn_bias=None, float dropout_p=0.0, bool is_causal=False, bool return_debug_mask=False, *, float? scale=None) -> (Tensor output, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, Tensor philox_seed, Tensor philox_offset, Tensor debug_attn_mask)
+  dispatch:
+    CompositeExplicitAutograd: _scaled_dot_product_fused_attention_overrideable
+    XPU: _scaled_dot_product_fused_attention_overrideable_xpu
+  tags: nondeterministic_seeded
+
+- func: _scaled_dot_product_flash_attention_backward(Tensor grad_out, Tensor query, Tensor key, Tensor value, Tensor out, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, float dropout_p, bool is_causal, Tensor philox_seed, Tensor philox_offset, *, float? scale=None) -> (Tensor grad_query, Tensor grad_key, Tensor grad_value)
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CUDA: _scaled_dot_product_flash_attention_backward_cuda
+    XPU: _scaled_dot_product_flash_attention_backward_xpu
+    NestedTensorCUDA: _scaled_dot_product_flash_attention_backward_nested
+
+- func: _scaled_dot_product_flash_attention_for_cpu_backward(Tensor grad_out, Tensor query, Tensor key, Tensor value, Tensor out, Tensor logsumexp, float dropout_p, bool is_causal, *, Tensor? attn_mask=None, float? scale=None) -> (Tensor grad_query, Tensor grad_key, Tensor grad_value)
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CPU: _scaled_dot_product_flash_attention_cpu_backward
+
+- func: _scaled_dot_product_fused_attention_overrideable_backward(Tensor grad_out, Tensor query, Tensor key, Tensor value, Tensor attn_bias, bool[4] grad_input_mask, Tensor out, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, float dropout_p, bool is_causal, Tensor philox_seed, Tensor philox_offset, *, float? scale=None) -> (Tensor grad_query, Tensor grad_key, Tensor grad_value, Tensor grad_attn_bias)
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: _scaled_dot_product_fused_attention_overrideable_backward
+
+- func: _scaled_dot_product_efficient_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_bias, bool compute_log_sumexp, float dropout_p=0.0, bool is_causal=False, *, float? scale=None) -> (Tensor output, Tensor log_sumexp, Tensor philox_seed, Tensor philox_offset)
+  dispatch:
+    CUDA: _scaled_dot_product_efficient_attention_cuda
+    NestedTensorCUDA: _scaled_dot_product_efficient_attention_nestedtensor_cuda
+  tags: nondeterministic_seeded
+
+- func: _scaled_dot_product_efficient_attention_backward(Tensor grad_out_, Tensor query, Tensor key, Tensor value, Tensor attn_bias, Tensor out, Tensor logsumexp, Tensor philox_seed, Tensor philox_offset, float dropout_p, bool[4] grad_input_mask, bool is_causal=False, *, float? scale=None) -> (Tensor, Tensor, Tensor, Tensor)
+  device_check: NoCheck
+  dispatch:
+    CUDA: _scaled_dot_product_efficient_attention_backward_cuda
+  tags: nondeterministic_seeded
+
+- func: _scaled_dot_product_cudnn_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_bias, bool compute_log_sumexp, float dropout_p=0.0, bool is_causal=False, bool return_debug_mask=False, *, float? scale=None) -> (Tensor output, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, Tensor philox_seed, Tensor philox_offset, Tensor debug_attn_mask)
+  dispatch:
+    CUDA: _scaled_dot_product_cudnn_attention_cuda
+    NestedTensorCUDA: _scaled_dot_product_cudnn_attention_nestedtensor_cuda
+  tags: nondeterministic_seeded
+
+- func: _scaled_dot_product_cudnn_attention_backward(Tensor grad_out, Tensor query, Tensor key, Tensor value, Tensor out, Tensor logsumexp, Tensor philox_seed, Tensor philox_offset, Tensor attn_bias, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, float dropout_p, bool is_causal, *, float? scale=None) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: _scaled_dot_product_cudnn_attention_backward_cuda
+    NestedTensorCUDA: _scaled_dot_product_cudnn_attention_nestedtensor_backward_cuda
+  tags: nondeterministic_seeded
+
+- func: _flash_attention_forward(Tensor query, Tensor key, Tensor value, Tensor? cum_seq_q, Tensor? cum_seq_k, SymInt max_q, SymInt max_k, float dropout_p, bool is_causal, bool return_debug_mask, *, float? scale=None, SymInt? window_size_left=None, SymInt? window_size_right=None, Tensor? seqused_k=None, Tensor? alibi_slopes=None) -> (Tensor output, Tensor softmax_logsumexp, Tensor rng_state, Tensor unused, Tensor debug_attn_mask)
+  variants: function
+  dispatch:
+    CUDA: _flash_attention_forward
+  tags: nondeterministic_seeded
+
+- func: _flash_attention_backward(Tensor grad_out, Tensor query, Tensor key, Tensor value, Tensor out, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, float dropout_p, bool is_causal, Tensor rng_state, Tensor unused, *, float? scale=None, SymInt? window_size_left=None, SymInt? window_size_right=None) -> (Tensor, Tensor, Tensor)
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CUDA: _flash_attention_backward
+
+# Returns output, logsumexp if compute_logsumexp
+- func: _efficient_attention_forward(Tensor query, Tensor key, Tensor value, Tensor? bias, Tensor? cu_seqlens_q, Tensor? cu_seqlens_k, SymInt? max_seqlen_q, SymInt? max_seqlen_k, float dropout_p, int custom_mask_type, bool compute_log_sumexp=False, *, float? scale=None, Tensor? seqlen_k=None, int? window_size=None) -> (Tensor output, Tensor logsumexp, Tensor philox_seed, Tensor philox_offset, SymInt max_seqlen_batch_q, SymInt max_seqlen_batch_k)
+  variants: function
+  dispatch:
+    CUDA: _efficient_attention_forward
+  tags: nondeterministic_seeded
+
+- func: _efficient_attention_backward(Tensor grad_out_, Tensor query, Tensor key, Tensor value, Tensor? bias, Tensor out, Tensor? cu_seqlens_q, Tensor? cu_seqlens_k, SymInt max_seqlen_q, SymInt max_seqlen_k, Tensor logsumexp, float dropout_p, Tensor philox_seed, Tensor philox_offset, int custom_mask_type, bool bias_requires_grad, *, float? scale=None, int? num_splits_key=None, int? window_size=None, bool shared_storage_dqdkdv=False) -> (Tensor, Tensor, Tensor, Tensor)
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CUDA: _efficient_attention_backward
+
+- func: _cudnn_attention_forward(Tensor query, Tensor key, Tensor value, Tensor? attn_bias, Tensor? cum_seq_q, Tensor? cum_seq_k, SymInt max_q, SymInt max_k, bool compute_log_sumexp, float dropout_p=0.0, bool is_causal=False, bool return_debug_mask=False, *, float? scale=None) -> (Tensor output, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, Tensor philox_seed, Tensor philox_offset, Tensor debug_attn_mask)
+  dispatch:
+    CUDA: _cudnn_attention_forward
+  tags: nondeterministic_seeded
+
+- func: _cudnn_attention_backward(Tensor grad_out, Tensor query, Tensor key, Tensor value, Tensor out, Tensor logsumexp, Tensor philox_seed, Tensor philox_offset, Tensor attn_bias, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, float dropout_p, bool is_causal, *, float? scale=None) -> (Tensor, Tensor, Tensor)
+  dispatch:
+    CUDA: _cudnn_attention_backward
+  tags: nondeterministic_seeded
+
+- func: _triton_scaled_dot_attention(Tensor q, Tensor k, Tensor v, float dropout_p=0.0) -> Tensor
+  variants: function
+  dispatch:
+    CUDA: triton_scaled_dot_attention
+  tags: nondeterministic_seeded
+  autogen: _triton_scaled_dot_attention.out
+
+- func: _fill_mem_eff_dropout_mask_(Tensor(a!) self, float dropout_p, int seed, int offset) -> Tensor(a!)
+  variants: function
+  dispatch:
+    CUDA: _fill_mem_eff_dropout_mask_
+  tags: nondeterministic_seeded
+
+- func: _triton_multi_head_attention(Tensor query, Tensor key, Tensor value, int embed_dim, int num_head, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, Tensor? mask=None) -> Tensor
+  variants: function
+  dispatch:
+    CUDA: triton_multi_head_attention
+  autogen: _triton_multi_head_attention.out
+
+- func: special_airy_ai(Tensor x) -> Tensor
+  python_module: special
+  structured_delegate: special_airy_ai.out
+  variants: function
+  tags: pointwise
+
+- func: special_airy_ai.out(Tensor x, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA: special_airy_ai_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_bessel_j0(Tensor self) -> Tensor
+  python_module: special
+  structured_delegate: special_bessel_j0.out
+  variants: function
+  tags: pointwise
+
+- func: special_bessel_j0.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_bessel_j0_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_bessel_j1(Tensor self) -> Tensor
+  python_module: special
+  structured_delegate: special_bessel_j1.out
+  variants: function
+  tags: pointwise
+
+- func: special_bessel_j1.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_bessel_j1_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_bessel_y0(Tensor self) -> Tensor
+  python_module: special
+  structured_delegate: special_bessel_y0.out
+  variants: function
+  tags: pointwise
+
+- func: special_bessel_y0.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_bessel_y0_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_bessel_y1(Tensor self) -> Tensor
+  python_module: special
+  structured_delegate: special_bessel_y1.out
+  variants: function
+  tags: pointwise
+
+- func: special_bessel_y1.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_bessel_y1_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_t(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_chebyshev_polynomial_t.out
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_t.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_t
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_t.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_t
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_t.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA, MPS: special_chebyshev_polynomial_t_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_t.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_t_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_t.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_t_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_u(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_chebyshev_polynomial_u.out
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_u.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_u
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_u.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_u
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_u.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA, MPS: special_chebyshev_polynomial_u_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_u.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_u_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_u.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_u_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_v(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_chebyshev_polynomial_v.out
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_v.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_v
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_v.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_v
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_v.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA, MPS: special_chebyshev_polynomial_v_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_v.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_v_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_v.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_v_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_w(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_chebyshev_polynomial_w.out
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_w.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_w
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_w.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_w
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_w.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA, MPS: special_chebyshev_polynomial_w_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_w.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_w_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_chebyshev_polynomial_w.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_chebyshev_polynomial_w_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_h(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_hermite_polynomial_h.out
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_h.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_hermite_polynomial_h
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_h.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_hermite_polynomial_h
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_h.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA, MPS: special_hermite_polynomial_h_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_h.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_hermite_polynomial_h_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_h.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_hermite_polynomial_h_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_he(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_hermite_polynomial_he.out
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_he.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_hermite_polynomial_he
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_he.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_hermite_polynomial_he
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_he.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA, MPS: special_hermite_polynomial_he_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_he.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_hermite_polynomial_he_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_hermite_polynomial_he.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_hermite_polynomial_he_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_laguerre_polynomial_l(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_laguerre_polynomial_l.out
+  variants: function
+  tags: pointwise
+
+- func: special_laguerre_polynomial_l.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_laguerre_polynomial_l
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_laguerre_polynomial_l.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_laguerre_polynomial_l
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_laguerre_polynomial_l.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA: special_laguerre_polynomial_l_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_laguerre_polynomial_l.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_laguerre_polynomial_l_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_laguerre_polynomial_l.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_laguerre_polynomial_l_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_legendre_polynomial_p(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_legendre_polynomial_p.out
+  variants: function
+  tags: pointwise
+
+- func: special_legendre_polynomial_p.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_legendre_polynomial_p
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_legendre_polynomial_p.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_legendre_polynomial_p
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_legendre_polynomial_p.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA: special_legendre_polynomial_p_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_legendre_polynomial_p.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_legendre_polynomial_p_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_legendre_polynomial_p.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_legendre_polynomial_p_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_modified_bessel_i0(Tensor self) -> Tensor
+  python_module: special
+  structured_delegate: special_modified_bessel_i0.out
+  variants: function
+  tags: pointwise
+
+- func: special_modified_bessel_i0.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_modified_bessel_i0_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_modified_bessel_i1(Tensor self) -> Tensor
+  python_module: special
+  structured_delegate: special_modified_bessel_i1.out
+  variants: function
+  tags: pointwise
+
+- func: special_modified_bessel_i1.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_modified_bessel_i1_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_modified_bessel_k0(Tensor self) -> Tensor
+  python_module: special
+  structured_delegate: special_modified_bessel_k0.out
+  variants: function
+  tags: pointwise
+
+- func: special_modified_bessel_k0.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_modified_bessel_k0_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_modified_bessel_k1(Tensor self) -> Tensor
+  python_module: special
+  structured_delegate: special_modified_bessel_k1.out
+  variants: function
+  tags: pointwise
+
+- func: special_modified_bessel_k1.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_modified_bessel_k1_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_scaled_modified_bessel_k0(Tensor x) -> Tensor
+  python_module: special
+  structured_delegate: special_scaled_modified_bessel_k0.out
+  variants: function
+  tags: pointwise
+
+- func: special_scaled_modified_bessel_k0.out(Tensor x, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_scaled_modified_bessel_k0_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_scaled_modified_bessel_k1(Tensor x) -> Tensor
+  python_module: special
+  structured_delegate: special_scaled_modified_bessel_k1.out
+  variants: function
+  tags: pointwise
+
+- func: special_scaled_modified_bessel_k1.out(Tensor x, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_scaled_modified_bessel_k1_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_t(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_shifted_chebyshev_polynomial_t.out
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_t.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_t
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_t.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_t
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_t.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA, MPS: special_shifted_chebyshev_polynomial_t_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_t.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_t_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_t.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_t_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_u(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_shifted_chebyshev_polynomial_u.out
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_u.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_u
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_u.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_u
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_u.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA, MPS: special_shifted_chebyshev_polynomial_u_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_u.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_u_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_u.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_u_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_v(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_shifted_chebyshev_polynomial_v.out
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_v.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_v
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_v.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_v
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_v.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA, MPS: special_shifted_chebyshev_polynomial_v_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_v.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_v_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_v.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_v_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_w(Tensor x, Tensor n) -> Tensor
+  device_check: NoCheck
+  python_module: special
+  structured_delegate: special_shifted_chebyshev_polynomial_w.out
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_w.x_scalar(Scalar x, Tensor n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_w
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_w.n_scalar(Tensor x, Scalar n) -> Tensor
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_w
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_w.out(Tensor x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  device_check: NoCheck
+  dispatch:
+    CPU, CUDA, MPS: special_shifted_chebyshev_polynomial_w_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_w.x_scalar_out(Scalar x, Tensor n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_w_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_shifted_chebyshev_polynomial_w.n_scalar_out(Tensor x, Scalar n, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CompositeExplicitAutograd: special_shifted_chebyshev_polynomial_w_out
+  device_check: NoCheck
+  python_module: special
+  variants: function
+  tags: pointwise
+
+- func: special_spherical_bessel_j0(Tensor x) -> Tensor
+  python_module: special
+  structured_delegate: special_spherical_bessel_j0.out
+  variants: function
+  tags: pointwise
+
+- func: special_spherical_bessel_j0.out(Tensor x, *, Tensor(a!) out) -> Tensor(a!)
+  dispatch:
+    CPU, CUDA, MPS: special_spherical_bessel_j0_out
+  python_module: special
+  structured_inherits: TensorIteratorBase
+  structured: True
+  variants: function
+  tags: pointwise
+
+# Aux function used in the test TestPythonDispatch.test_kwarg_only_and_positional_default
+# within test/test_python_dispatch.py
+- func: _foobar(Tensor self, bool arg1=True, bool arg2=True, *, bool arg3=True) -> Tensor
+  dispatch:
+    CPU: foobar
+  autogen: _foobar.out
+
+- func: _fused_adam_(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] exp_avgs, Tensor(d!)[] exp_avg_sqs, Tensor(e!)[] max_exp_avg_sqs, Tensor[] state_steps, *, float lr, float beta1, float beta2, float weight_decay, float eps, bool amsgrad, bool maximize, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()
+  # Unlike "foreach" functions, lists of tensors should be guaranteed to be on the same device (for now).
+  variants: function
+  dispatch:
+    CPU: _fused_adam_kernel_cpu_
+    CUDA: _fused_adam_kernel_cuda_
+    MPS: _fused_adam_kernel_mps_
+  autogen: _fused_adam, _fused_adam.out
+
+- func: _fused_adam_.tensor_lr(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] exp_avgs, Tensor(d!)[] exp_avg_sqs, Tensor(e!)[] max_exp_avg_sqs, Tensor[] state_steps, *, Tensor lr, float beta1, float beta2, float weight_decay, float eps, bool amsgrad, bool maximize, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()
+  # Unlike "foreach" functions, lists of tensors should be guaranteed to be on the same device (for now),
+  # but still skip the device check as the Tensor LR can be on CPU
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CPU: _fused_adam_kernel_cpu_
+    CUDA: _fused_adam_kernel_cuda_
+    MPS: _fused_adam_kernel_mps_
+  autogen: _fused_adam.tensor_lr, _fused_adam.tensor_lr_out
+
+- func: _fused_adamw_(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] exp_avgs, Tensor(d!)[] exp_avg_sqs, Tensor(e!)[] max_exp_avg_sqs, Tensor[] state_steps, *, float lr, float beta1, float beta2, float weight_decay, float eps, bool amsgrad, bool maximize, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()
+  # Unlike "foreach" functions, lists of tensors should be guaranteed to be on the same device (for now).
+  variants: function
+  dispatch:
+    CPU: _fused_adamw_kernel_cpu_
+    CUDA: _fused_adamw_kernel_cuda_
+    MPS: _fused_adamw_kernel_mps_
+  autogen: _fused_adamw, _fused_adamw.out
+
+- func: _fused_adamw_.tensor_lr(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] exp_avgs, Tensor(d!)[] exp_avg_sqs, Tensor(e!)[] max_exp_avg_sqs, Tensor[] state_steps, *, Tensor lr, float beta1, float beta2, float weight_decay, float eps, bool amsgrad, bool maximize, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()
+  # Unlike "foreach" functions, lists of tensors should be guaranteed to be on the same device (for now),
+  # but still skip the device check as the Tensor LR can be on CPU
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CPU: _fused_adamw_kernel_cpu_
+    CUDA: _fused_adamw_kernel_cuda_
+    MPS: _fused_adamw_kernel_mps_
+  autogen: _fused_adamw.tensor_lr, _fused_adamw.tensor_lr_out
+
+- func: _fused_sgd_(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] momentum_buffer_list, *, float weight_decay, float momentum, float lr, float dampening, bool nesterov, bool maximize, bool is_first_step, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()
+  # Unlike "foreach" functions, lists of tensors should be guaranteed to be on the same device (for now).
+  variants: function
+  dispatch:
+    CPU: _fused_sgd_kernel_cpu_
+    CUDA: _fused_sgd_kernel_cuda_
+    MPS: _fused_sgd_kernel_mps_
+  autogen: _fused_sgd, _fused_sgd.out
+
+- func: _fused_sgd_.tensor_lr(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] momentum_buffer_list, *, float weight_decay, float momentum, Tensor lr, float dampening, bool nesterov, bool maximize, bool is_first_step, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()
+  # Unlike "foreach" functions, lists of tensors should be guaranteed to be on the same device (for now).
+  # but still skip the device check as the Tensor LR can be on CPU
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CPU: _fused_sgd_kernel_cpu_
+    CUDA: _fused_sgd_kernel_cuda_
+    MPS: _fused_sgd_kernel_mps_
+  autogen: _fused_sgd.tensor_lr, _fused_sgd.tensor_lr_out
+
+- func: _fused_adagrad_(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] state_sums, Tensor(d!)[] state_steps, *, float lr, float lr_decay, float weight_decay, float eps, bool maximize, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()
+  variants: function
+  dispatch:
+    CPU: _fused_adagrad_kernel_cpu_
+    CUDA: _fused_adagrad_kernel_cuda_
+  autogen: _fused_adagrad, _fused_adagrad.out
+
+- func: _fused_adagrad_.tensor_lr(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] state_sums, Tensor[] state_steps, *, Tensor lr, float lr_decay, float weight_decay, float eps, bool maximize, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()
+  device_check: NoCheck
+  variants: function
+  dispatch:
+    CPU: _fused_adagrad_kernel_cpu_
+    CUDA: _fused_adagrad_kernel_cuda_
+  autogen: _fused_adagrad.tensor_lr, _fused_adagrad.tensor_lr_out
+
+# This op is ONLY used by pytorch/XLA in functionalization, and should never show up in vanilla eager mode or in any pytorch tracing contexts.
+- func: _propagate_xla_data(Tensor input, Tensor output) -> ()
+  variants: function
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/native/tags.yaml b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/native/tags.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..6a53d4833adeb427c969753d8fe2adada1d64c60
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/native/tags.yaml
@@ -0,0 +1,99 @@
+# This yaml file contains all the possible tags that can be defined in `tags` in `native_functions.yaml`
+
+- tag: inplace_view
+  desc: |
+          This tag indicates if an operator *only* modifies the tensor metadata
+- tag: pt2_compliant_tag
+  desc: |
+          This tag indicates if the operator is guaranteed to
+          work with the PT2 compilation APIs (torch.compile,
+          torch.export, etc). If you add this tag to an
+          operator, please use
+          `torch.testing._internal.optest.opcheck` to test that
+          the operator has been registered correctly and
+          works with torch.compile
+- tag: view_copy
+  desc: |
+          This tag indicates operators that are *_copy* variants
+          of view/aliasing operators. If an operator has a view_copy tag,
+          then it should have the name {op}_copy, where {op} is a view operator.
+- tag: dynamic_output_shape
+  desc: |
+          This tag indicates if an operator's output's shape depends on input Tensor
+          data.
+- tag: data_dependent_output
+  desc: |
+          Operator has a non-Tensor output whose value is dependent on the data
+          of Tensor inputs.  Among other things, this implies that this operator
+          cannot be run with meta tensor (since data is not available), nor
+          can it be symbolically traced.
+- tag: generated
+  desc: |
+          This tag indicates that the operator doesn't have an explicit entry in
+          native_functions.yaml, and instead was generated automatically by the codegen.
+- tag: nondeterministic_seeded
+  desc: |
+          This tag indicates if an operator is nondeterministically seeded
+          (i.e., is random) such that the operator intentionally produces
+          different results when run twice on the same inputs, but this randomness
+          is controlled by a Generator which, if reseeded would give you the
+          same result.
+- tag: nondeterministic_bitwise
+  desc: |
+          This tag indicates if an operator doesn't guarantee bitwise equivalence
+          across different runs of an operator with identical inputs.
+- tag: needs_exact_strides
+  desc: |
+          This tag indicates that the operator should be passed Tensors following
+          the same strides as observed in eager when compiled in inductor.
+          Only one of {needs_exact_strides, needs_contiguous_strides, needs_fixed_stride_order, flexible_layout}
+          can apply; if multiple are assigned then we assume the most restrictive one.
+- tag: needs_contiguous_strides
+  desc: |
+          This tag indicates that the operator should be passed contiguous Tensors.
+          Failure to do so will result in undefined behavior.
+- tag: needs_fixed_stride_order
+  desc: |
+          This tag indicates that the operator should be passed Tensors following
+          the same stride permutation as observed in eager when compiled in inductor.
+          Only one of {needs_exact_strides, needs_contiguous_strides, needs_fixed_stride_order, flexible_layout}
+          can apply; if multiple are assigned then we assume the most restrictive one.
+- tag: flexible_layout
+  desc: |
+          This tag indicates that the custom operator can accept inputs with varying
+          strides/storage_offset and that when compiled, Inductor is allowed to change
+          the strides/storage_offset of inputs to the custom operator.
+          Only one of {needs_exact_strides, needs_contiguous_strides, needs_fixed_stride_order, flexible_layout}
+          can apply; if multiple are assigned then we assume the most restrictive one.
+
+# NOTE [Core ATen Ops]
+- tag: core
+  desc: |
+          Core aten ops is a subset of aten ops that remains after aten-to-aten decomposition and
+          functionalization pass. Core aten ops are fully functional and adhere to single static
+          assignment (SSA): this implies there will be no `inplace` or `_out` variants in this opset.
+          This opset is designed to serve as the functional IR to interface with compiler backends.
+          In contrast to primTorch, core aten opset doesn't decompose ops into explicit
+          type promotion and broadcasting ops.
+          Core aten ops is also effectively the opset produced by torchdynamo.export(aten_graph=True),
+          and thus can be used as an opset for export purpose.
+- tag: pointwise
+  desc: |
+          Pointwise operators are operators where each element of the output is computed only by accessing
+          the corresponding element of all the broadcasted inputs. The output shape will be the broadcasted
+          shape of the inputs.
+- tag: maybe_aliasing_or_mutating
+  desc: |
+          For some ops, we can't statically determine whether the op is functional or not. Note that this is only
+          relevant to CIA ops that decompose before functionalization/autograd. It is useful to
+          know this information for export as we would want to decompose these ops as they are unsafe to be
+          preserved.
+- tag: cudagraph_unsafe
+  desc: |
+          This operator does not support cudagraphs. The presence of this tag on an operator will cause
+          Inductor to split the graph around this operator. Note that operators without this tag may still
+          not support CUDAGraphs. Inductor may have other hardcoded lists around that.
+- tag: reduction
+  desc: |
+          This tag indicates that an operator performs a reduction operation, computing aggregate values
+          (sum, mean, max, min, etc.) across one or more dimensions of the input tensor(s).
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ATenOpList.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ATenOpList.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..5de3424857e236917eb68940e7904446de59f586
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ATenOpList.cpp
@@ -0,0 +1,36 @@
+#include <ATen/core/ATenOpList.h>
+
+#include <string>
+#include <cstring>
+#include <utility>
+#include <unordered_set>
+#include <ATen/core/operator_name.h>
+
+// ${generated_comment}
+
+namespace at {
+
+namespace {
+struct OpNameEquals final {
+  bool operator()(const std::pair<const char*, const char*>& lhs, const std::pair<const char*, const char*>& rhs) const {
+      return 0 == strcmp(lhs.first, rhs.first) && 0 == strcmp(lhs.second, rhs.second);
+  }
+};
+
+struct OpNameHash final {
+  size_t operator()(const std::pair<const char*, const char*>& p) const {
+      // use std::hash<std::string> because std::hash<const char*> would hash pointers and not pointed-to strings
+      return std::hash<std::string>()(p.first) ^ (~ std::hash<std::string>()(p.second));
+  }
+};
+}
+
+bool is_custom_op(const c10::OperatorName& opName) {
+  static std::unordered_set<std::pair<const char*, const char*>, OpNameHash, OpNameEquals> ops {
+    ${aten_ops}
+    {"", ""}
+  };
+  return ops.count(std::make_pair(
+             opName.name.c_str(), opName.overload_name.c_str())) == 0;
+}
+}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/CompositeViewCopyKernels.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/CompositeViewCopyKernels.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..47097d7aa4320674bec4bddbb5ac861309334f0c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/CompositeViewCopyKernels.cpp
@@ -0,0 +1,73 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+#include <ATen/InferSize.h>
+#include <ATen/Tensor.h>
+#include <ATen/native/Resize.h>
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Operators.h>
+#else
+#include <ATen/ops/clone.h>
+$ops_headers
+#endif
+
+namespace at {
+namespace native {
+
+// This file contains a number of kernels for aten functions that are fully code-generated.
+// TODO: rename this file to something more generic.
+
+namespace {
+at::Tensor clone_arg(const at::Tensor& t) {
+    return t.clone();
+}
+
+std::vector<at::Tensor> clone_arg(const at::TensorList& t_list) {
+    std::vector<at::Tensor> out(t_list.size());
+    for (const auto& i : c10::irange(t_list.size())) {
+        out[i] = t_list[i].clone();
+    }
+    return out;
+}
+
+// duped with gen_resize_out_helper from structured kernels
+void copy_arg(const at::Tensor& dst, const at::Tensor& src) {
+    TORCH_CHECK(src.dtype() == dst.dtype(),
+        "Expected out tensor to have dtype ", src.dtype(), ", but got ", dst.dtype(), " instead");
+    TORCH_CHECK(src.device() == dst.device(),
+        "Expected out tensor to have device ", src.device(), ", but got ", dst.device(), " instead");
+    dst.copy_(src);
+}
+
+void copy_arg(const at::TensorList& dst, const at::TensorList& src) {
+    TORCH_INTERNAL_ASSERT(dst.size() == src.size());
+    for (const auto& i : c10::irange(dst.size())) {
+        copy_arg(dst[i], src[i]);
+    }
+}
+
+// TODO: this doesn't handle restriding empty tensors correctly; see
+// gen_resize_out_helper for the correct algorithm
+
+void resize_out_helper(const at::Tensor& dst, const at::Tensor& src) {
+    at::native::resize_output(dst, src.sizes());
+}
+
+void resize_out_helper(const at::TensorList& dst, const at::TensorList& src) {
+    TORCH_INTERNAL_ASSERT(dst.size() == src.size());
+    for (const auto& i : c10::irange(dst.size())) {
+        at::native::resize_output(dst[i], src[i].sizes());
+    }
+}
+}
+
+
+${CompositeViewCopyKernel_Definitions}
+
+${GeneratedCompositeFunctional_Definitions}
+
+${GeneratedCompositeOut_Definitions}
+
+} // namespace native
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyFunction.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyFunction.h
new file mode 100644
index 0000000000000000000000000000000000000000..c92d5eb3898ecea0fb9e1f79c2725d1bc6dfa7fb
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyFunction.h
@@ -0,0 +1,23 @@
+#pragma once
+// ${generated_comment}
+
+// NB: The implementing C++ file is RegisterDispatchKey.cpp
+
+// The only #includes we need are for custom classes that have defaults in the C++ API
+#include <c10/core/MemoryFormat.h>
+#include <c10/core/Scalar.h>
+#include <ATen/core/Reduction.h>
+
+// Forward declarations of any types needed in the operator signatures.
+// We can't directly include these classes because it will cause circular include dependencies.
+// This file is included by TensorBody.h, which defines the Tensor class.
+#include <ATen/core/ATen_fwd.h>
+
+namespace at {
+
+namespace ${dispatch_namespace} {
+
+${dispatch_namespaced_declarations}
+
+} // namespace ${dispatch_namespace}
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyFunctions.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyFunctions.h
new file mode 100644
index 0000000000000000000000000000000000000000..35f43297fdd9ca9f932c8c53b5b773f1b9b8a427
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyFunctions.h
@@ -0,0 +1,29 @@
+#include <ATen/core/TensorBody.h>
+
+// TODO Undo all logic introduced for Note [Avoiding Include Cycles In Static Dispatch]
+// Code introduced to avoid cyclic dependency in static dispatch is no longer
+// needed as static dispatch logic is moved from TensorBody.h, which caused cycles in the first place,
+// to Operators.cpp for supporting multiple backends with multiple kernels.
+//
+// Note [Avoiding Include Cycles In Static Dispatch]
+// In order to avoid #include cycles in the static dispatch build, we've carefully split out
+// the static function definition files into {DispatchKey}Functions.h and {DispatchKey}Functions_inl.h.
+//
+// Without this split, the include cycle looks like TensorBody.h -> CPUFunctions.h -> TensorBody.h.
+// - TensorBody.h #includes CPUFunctions.h in the static dispatch build, because the tensor methods
+//   all need to call into the fastpath C++ API defined in CPUFunctions.h. The methods are also all
+//   directly inlined into TensorBody.h.
+// - CPUFunctions.h #includes TensorBody.h because it contains function declarations for the entire C++ API,
+//   which include functions that have defaultable std::optional<Tensor> arguments.
+//   That requires knowing the full Tensor class definition.
+//
+// We break the cycle by doing the following:
+// - Split out CPUFunction.h into two files: CPUFunctions.h and CPUFunctions_inl.h
+// - CPUFunction.h is a dummy file that just includes the Tensor class and includes CPUFunctions_inl.,
+// - CPUFunctions_inl.h includes everything else
+// - (only in the static dispatch build) TensorBody.h makes sure to finish defining the Tensor class,
+//   and then it includes CPUFunctions_inl.h.
+// - All other files that want the cpu fastpath functions can include CPUFunctions.h directly.
+// - This also means that static dispatch build, CPUFunctions.h only needs to
+//   #include TensorBody.h, and it will automatically bring in CPUFunctions_inl.h.
+${inline_headers}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyFunctions_inl.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyFunctions_inl.h
new file mode 100644
index 0000000000000000000000000000000000000000..fbb71c2cb123cb21fb57ec32341d86bff06f6a17
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyFunctions_inl.h
@@ -0,0 +1,22 @@
+#pragma once
+// ${generated_comment}
+
+// NB: The implementing C++ file is RegisterDispatchKey.cpp
+
+// The only #includes we need are for custom classes that have defaults in the C++ API
+#include <c10/core/MemoryFormat.h>
+#include <c10/core/Scalar.h>
+#include <ATen/core/Reduction.h>
+
+#if defined(AT_PER_OPERATOR_HEADERS) && defined(TORCH_ASSERT_ONLY_METHOD_OPERATORS)
+#error This change adds a dependency on all pytorch operators, meaning the     \
+  file will need to be re-compiled every time an operator is changed or added. \
+  Consider including a specific operator from                                  \
+  <ATen/ops/{my_operator}_${dispatch_namespace}_dispatch.h>.                   \
+  See NOTE [TORCH_ASSERT_ONLY_METHOD_OPERATORS].
+#endif
+
+${DispatchKeyFunctions_inl_includes}
+
+
+${dispatch_namespaced_declarations}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyNativeFunctions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyNativeFunctions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..7647f459a744b2eacfac6aaea4f49b86babbb234
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyNativeFunctions.cpp
@@ -0,0 +1,13 @@
+// ${generated_comment}
+${includes}
+${native_functions_include}
+
+namespace {
+${helper_fns}
+} // namespace
+
+${namespace_prologue}
+
+${native_function_definitions}
+
+${namespace_epilogue}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyNativeFunctions.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyNativeFunctions.h
new file mode 100644
index 0000000000000000000000000000000000000000..b45a17b5922f8a0b76e0237616914ce9969efca5
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/DispatchKeyNativeFunctions.h
@@ -0,0 +1,19 @@
+#pragma once
+
+// an external backend might generate file within its code tree
+// and check all the source files within the tree with clang-format.
+// so, disable it since the backend might have a different config.
+// clang-format off
+
+// ${generated_comment}
+
+#include <ATen/Tensor.h>
+
+${namespace_prologue}
+
+struct ${class_name} {
+
+${dispatch_declarations}
+
+};
+${namespace_epilogue}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Function.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Function.h
new file mode 100644
index 0000000000000000000000000000000000000000..73096afbf11571cbe4147bb63f035a054ca842db
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Function.h
@@ -0,0 +1,27 @@
+#pragma once
+
+// ${generated_comment}
+
+#include <ATen/Context.h>
+#include <ATen/DeviceGuard.h>
+#include <ATen/TensorUtils.h>
+#include <ATen/TracerMode.h>
+#include <ATen/core/Generator.h>
+#include <ATen/core/Reduction.h>
+#include <ATen/core/Tensor.h>
+#include <c10/core/Scalar.h>
+#include <c10/core/Storage.h>
+#include <c10/core/TensorOptions.h>
+#include <c10/util/Deprecated.h>
+#include <optional>
+#include <string_view>
+
+${static_dispatch_ops_headers}
+
+${operator_includes}
+
+namespace at {
+
+${function_definitions}
+
+}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/FunctionalInverses.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/FunctionalInverses.h
new file mode 100644
index 0000000000000000000000000000000000000000..b15cd09a6c65da3127be8245b87bff2f8c795a3d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/FunctionalInverses.h
@@ -0,0 +1,23 @@
+#pragma once
+
+// ${generated_comment}
+
+#include <ATen/FunctionalStorageImpl.h>
+#include <ATen/Tensor.h>
+
+namespace at {
+namespace functionalization {
+
+struct FunctionalInverses {
+
+${view_inverse_declarations}
+
+// NB: These are not generated! They're manually implemented in the template.
+// TODO: Change codegen to generate these. See the following link:
+// https://github.com/pytorch/pytorch/blob/main/torchgen/model.py#L2583-L2585
+static at::Tensor chunk_inverse(const at::Tensor & base, const at::Tensor & mutated_view, InverseReturnMode inverse_return_mode, int64_t mutated_view_idx, int chunks, int dim);
+static at::Tensor narrow_inverse(const at::Tensor & base, const at::Tensor & mutated_view, InverseReturnMode inverse_return_mode, int dim, c10::SymInt start, c10::SymInt length);
+
+};
+}
+}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Functions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Functions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..f210402e543aa2de27ea0f510bb869e0c7010e22
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Functions.cpp
@@ -0,0 +1,105 @@
+#include <array>
+
+#include <ATen/Functions.h>
+#include <ATen/Utils.h>
+#include <c10/core/Allocator.h>
+
+namespace at {
+
+Tensor TensorMaker::make_tensor() {
+   AutoDispatchBelowADInplaceOrView guard{}; // TODO: Remove.
+   tracer::impl::NoTracerDispatchMode tracer_guard{};
+
+   check_size_nonnegative(sizes_);
+
+   TORCH_CHECK_VALUE(
+       !deleter_ || !ctx_,
+       "The deleter and context arguments are mutually exclusive.");
+
+   if (device_ == std::nullopt) {
+     device_ = globalContext().getDeviceFromPtr(data_, opts_.device().type());
+   }
+
+   if (opts_.device().has_index()) {
+     // clang-format off
+     TORCH_CHECK_VALUE(
+         opts_.device() == *device_,
+         "Specified device ", opts_.device(), " does not match device of data ", *device_);
+     // clang-format on
+   }
+
+   std::size_t size_bytes = computeStorageSize();
+
+   DataPtr data_ptr{};
+   if (deleter_) {
+     data_ptr = makeDataPtrFromDeleter();
+   } else {
+     data_ptr = makeDataPtrFromContext();
+   }
+
+   TORCH_CHECK(!resizeable_ || allocator_ != nullptr, "Must specify an allocator with allocator() if you want to use resizeable_storage()");
+   Storage storage{Storage::use_byte_size_t{}, size_bytes, std::move(data_ptr), /*allocator=*/allocator_, /*resizable=*/resizeable_};
+
+   Tensor tensor = detail::make_tensor<TensorImpl>(
+       std::move(storage), opts_.computeDispatchKey(), opts_.dtype());
+
+  TensorImpl* tensor_impl = tensor.unsafeGetTensorImpl();
+  if (strides_) {
+    tensor_impl->set_sizes_and_strides(sizes_, *strides_);
+  } else {
+    tensor_impl->set_sizes_contiguous(sizes_);
+  }
+  if (storage_offset_) {
+    tensor_impl->set_storage_offset(*storage_offset_);
+  }
+
+  tensor_impl->set_requires_grad(opts_.requires_grad());
+
+  return tensor;
+ }
+
+ std::size_t TensorMaker::computeStorageSize() const noexcept {
+   std::size_t itemsize = opts_.dtype().itemsize();
+
+   if (strides_) {
+     auto storage_size = detail::computeStorageNbytes(sizes_, *strides_, itemsize);
+     if (storage_offset_) {
+       storage_size += storage_offset_.value() * itemsize;
+     }
+     return storage_size;
+   }
+
+   std::size_t size = 1;
+   for (std::int64_t s : sizes_) {
+     size *= static_cast<std::size_t>(s);
+   }
+   auto storage_size = size * itemsize;
+   if (storage_offset_) {
+     storage_size += storage_offset_.value() * itemsize;
+   }
+   return storage_size;
+ }
+
+ inline DataPtr TensorMaker::makeDataPtrFromDeleter() noexcept {
+   return InefficientStdFunctionContext::makeDataPtr(data_, std::move(deleter_), *device_);
+ }
+
+ inline DataPtr TensorMaker::makeDataPtrFromContext() noexcept {
+   return DataPtr{data_, ctx_.release(), ctx_.get_deleter(), *device_};
+ }
+
+ IntArrayRef TensorMaker::makeTempSizes() const noexcept {
+   static std::int64_t zeros[5] = {0, 0, 0, 0, 0};
+   if (opts_.has_memory_format()) {
+     MemoryFormat format = *opts_.memory_format_opt();
+     if (format == MemoryFormat::ChannelsLast) {
+       return IntArrayRef(zeros, 4);
+     }
+     if (format == MemoryFormat::ChannelsLast3d) {
+       return IntArrayRef(zeros, 5);
+     }
+   }
+   return IntArrayRef(zeros, 1);
+ }
+
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Functions.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Functions.h
new file mode 100644
index 0000000000000000000000000000000000000000..b1feaf9d4daa9786359c97434e4c59d3c75778c7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Functions.h
@@ -0,0 +1,143 @@
+#pragma once
+
+// ${generated_comment}
+
+#ifdef TORCH_ASSERT_NO_OPERATORS
+#error This change adds a dependency on native_functions.yaml,            \
+  meaning the file will need to be re-compiled every time an operator     \
+  is changed or added. Consider if your change would be better placed in  \
+  another file, or if a more specific header might achieve the same goal. \
+  See NOTE: [Tensor vs. TensorBase]
+#endif
+
+#if defined(AT_PER_OPERATOR_HEADERS) && defined(TORCH_ASSERT_ONLY_METHOD_OPERATORS)
+#error This change adds a dependency on all pytorch operators, meaning the     \
+  file will need to be re-compiled every time an operator is changed or added. \
+  Consider including a specific operator from <ATen/ops/{my_operator}.h> and   \
+  see NOTE [TORCH_ASSERT_ONLY_METHOD_OPERATORS].
+#endif
+
+// NOTE: [TORCH_ASSERT_ONLY_METHOD_OPERATORS]
+//
+// In ATen, certain generated headers files include the definitions of
+// every single operator in PyTorch. Unfortunately this means every
+// time an operator signature is updated or changed in
+// native_functions.yaml, you (and every other PyTorch developer) need
+// to recompile every source file that includes any of these headers.
+//
+// To break up these header dependencies, and improve incremental
+// build times for all PyTorch developers. These headers are split
+// into per-operator headers in the `ATen/ops` folder. This limits
+// incremental builds to only changes to methods of `Tensor`, or files
+// that use the specific operator being changed. With `at::sum` as an
+// example, you should include
+//
+//   <ATen/ops/sum.h>               // instead of ATen/Functions.h
+//   <ATen/ops/sum_native.h>        // instead of ATen/NativeFunctions.h
+//   <ATen/ops/sum_ops.h>           // instead of ATen/Operators.h
+//   <ATen/ops/sum_cpu_dispatch.h>  // instead of ATen/CPUFunctions.h
+//
+// However, even if you're careful to use this in your own code.
+// `Functions.h` might be included indirectly through another header
+// without you realising. To avoid this, you can add
+//
+//   #define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+//
+// to the top of your source file. This way any time the non-specific
+// headers are included, the compiler will error out.
+//
+// Also, be aware that `ops` are not available in all build
+// configurations (namely fb-internal) so you must guard these
+// includes with `#ifdef AT_PER_OPERATOR_HEADERS`. e.g.
+//
+//   #ifndef AT_PER_OPERATOR_HEADERS
+//   #include <ATen/Functions.h>
+//   #else
+//   #include <ATen/ops/sum.h>
+//   #endif
+
+#include <ATen/Context.h>
+#include <ATen/DeviceGuard.h>
+#include <ATen/TensorUtils.h>
+#include <ATen/TracerMode.h>
+#include <ATen/core/Generator.h>
+#include <ATen/core/Reduction.h>
+#include <c10/core/SymInt.h>
+#include <ATen/core/Tensor.h>
+#include <c10/core/Scalar.h>
+#include <c10/core/Storage.h>
+#include <c10/core/TensorOptions.h>
+#include <c10/util/Deprecated.h>
+#include <optional>
+#include <c10/util/OptionalArrayRef.h>
+
+#include <ATen/ops/from_blob.h>
+#include <ATen/ops/tensor.h>
+
+${Functions_includes}
+
+namespace at {
+
+${Functions_declarations}
+
+// Special C++ only overloads for std()-like functions (See gh-40287)
+// These are needed because int -> bool conversion takes precedence over int -> IntArrayRef
+// So, for example std(0) would select the std(unbiased=False) overload
+inline Tensor var(const Tensor& self, int dim) {
+  return at::var(self, IntArrayRef{dim});
+}
+inline std::tuple<Tensor, Tensor> var_mean(const Tensor& self, int dim) {
+  return at::var_mean(self, IntArrayRef{dim});
+}
+inline Tensor std(const Tensor& self, int dim) {
+  return at::std(self, IntArrayRef{dim});
+}
+inline std::tuple<Tensor, Tensor> std_mean(const Tensor& self, int dim) {
+  return at::std_mean(self, IntArrayRef{dim});
+}
+
+inline int64_t numel(const Tensor& tensor) {
+  return tensor.numel();
+}
+
+inline int64_t size(const Tensor& tensor, int64_t dim) {
+  return tensor.size(dim);
+}
+
+inline int64_t stride(const Tensor& tensor, int64_t dim) {
+  return tensor.stride(dim);
+}
+
+inline bool is_complex(const Tensor& tensor) {
+  return tensor.is_complex();
+}
+
+inline bool is_floating_point(const Tensor& tensor) {
+  return tensor.is_floating_point();
+}
+
+inline bool is_signed(const Tensor& tensor) {
+  return tensor.is_signed();
+}
+
+inline bool is_inference(const Tensor& tensor) {
+  return tensor.is_inference();
+}
+
+inline bool _is_zerotensor(const Tensor& tensor) {
+  return tensor._is_zerotensor();
+}
+
+inline bool is_conj(const Tensor& tensor) {
+  return tensor.is_conj();
+}
+
+inline Tensor conj(const Tensor& tensor) {
+  return tensor.conj();
+}
+
+inline bool is_neg(const Tensor& tensor) {
+  return tensor.is_neg();
+}
+
+}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/LazyIr.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/LazyIr.h
new file mode 100644
index 0000000000000000000000000000000000000000..9190ff8243d316fd2bd472bb3f0603701761bdb7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/LazyIr.h
@@ -0,0 +1,19 @@
+#pragma once
+
+// This file contains autogenerated LazyTensor IR nodes
+${lazy_ir_sysinc}
+${lazy_ir_inc}
+
+${namespace_prologue}
+using at::operator<<;
+
+// kNullValue is used to contribute a static hash value any time
+// a node has an Optional<Value> input that is nullopt.  It is important
+// to differentiate between HASH(std::nullopt, something) and HASH(something, std::nullopt),
+// and using kNullValue in the hash function in the order of arguments
+// serves this purpose.
+static const torch::lazy::Value kNullValue = torch::lazy::Value();
+
+${ir_declarations}
+
+${namespace_epilogue}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/LazyNonNativeIr.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/LazyNonNativeIr.h
new file mode 100644
index 0000000000000000000000000000000000000000..18eaf6da52e4b3654becac6cc89849bc0806ae09
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/LazyNonNativeIr.h
@@ -0,0 +1,11 @@
+#pragma once
+
+${lazy_non_native_ir_inc}
+
+// This file contains autogenerated LazyTensor Non Native IR nodes
+
+${namespace_prologue}
+
+${non_native_ir_nodes}
+
+${namespace_epilogue}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/MethodOperators.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/MethodOperators.h
new file mode 100644
index 0000000000000000000000000000000000000000..0e192cd05ef3c78fa74848c93de32150c1e3fd8b
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/MethodOperators.h
@@ -0,0 +1,24 @@
+#pragma once
+
+// ${generated_comment}
+
+#ifdef TORCH_ASSERT_NO_OPERATORS
+#error This change adds a dependency on native_functions.yaml,             \
+  meaning the file will need to be re-compiled every time an operator      \
+  is changed or added. Consider if your change would be better placed in   \
+  another file, or if a more specific header might achieve the same goal.  \
+  See NOTE: [Tensor vs. TensorBase]
+#endif
+
+// Forward declarations of any types needed in the operator signatures.
+// We can't directly include these classes because it will cause circular include dependencies.
+// This file is included by TensorBody.h, which defines the Tensor class.
+#include <ATen/core/ATen_fwd.h>
+
+${MethodOperators_includes}
+
+namespace at {
+namespace _ops {
+${MethodOperators_declarations}
+} // namespace _ops
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeFunction.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeFunction.h
new file mode 100644
index 0000000000000000000000000000000000000000..a5441ad85d1d5e28c4e31dd3f0dc7f66dfbff9e7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeFunction.h
@@ -0,0 +1,17 @@
+#pragma once
+
+// ${generated_comment}
+
+#include <c10/core/Scalar.h>
+#include <c10/core/Storage.h>
+#include <c10/core/TensorOptions.h>
+#include <c10/util/Deprecated.h>
+#include <optional>
+#include <c10/core/QScheme.h>
+#include <ATen/core/Reduction.h>
+#include <ATen/core/Tensor.h>
+#include <tuple>
+#include <vector>
+${extra_includes}
+
+${native_function_declarations}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeFunctions.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeFunctions.h
new file mode 100644
index 0000000000000000000000000000000000000000..9dc972495ca038bddb7b887c39c2e0507e487213
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeFunctions.h
@@ -0,0 +1,33 @@
+#pragma once
+
+// ${generated_comment}
+
+#ifdef TORCH_ASSERT_NO_OPERATORS
+#error This change adds a dependency on native_functions.yaml,            \
+  meaning the file will need to be re-compiled every time an operator     \
+  is changed or added. Consider if your change would be better placed in  \
+  another file, or if a more specific header might achieve the same goal. \
+  See NOTE: [Tensor vs. TensorBase]
+#endif
+
+#if defined(AT_PER_OPERATOR_HEADERS) && defined(TORCH_ASSERT_ONLY_METHOD_OPERATORS)
+#error This change adds a dependency on all pytorch operators, meaning the      \
+  file will need to be re-compiled every time an operator is changed or added.  \
+  Consider including a specific operator from <ATen/ops/{my_operator}_native.h> \
+  and see NOTE [TORCH_ASSERT_ONLY_METHOD_OPERATORS].
+#endif
+
+#include <c10/core/Scalar.h>
+#include <c10/core/Storage.h>
+#include <c10/core/TensorOptions.h>
+#include <c10/util/Deprecated.h>
+#include <optional>
+#include <c10/core/QScheme.h>
+#include <ATen/core/Reduction.h>
+#include <ATen/core/Tensor.h>
+#include <tuple>
+#include <vector>
+
+${NativeFunctions_includes}
+
+${NativeFunctions_declarations}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeMetaFunction.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeMetaFunction.h
new file mode 100644
index 0000000000000000000000000000000000000000..6522c97546d0498e4b3825fb4eafefbb34c71911
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeMetaFunction.h
@@ -0,0 +1,23 @@
+#pragma once
+
+// ${generated_comment}
+
+#include <c10/core/Scalar.h>
+#include <c10/core/Storage.h>
+#include <c10/core/TensorOptions.h>
+#include <c10/util/Deprecated.h>
+#include <optional>
+#include <c10/core/QScheme.h>
+#include <ATen/core/Reduction.h>
+#include <ATen/TensorIterator.h>
+#include <ATen/TensorMeta.h>
+#include <tuple>
+#include <vector>
+
+namespace at {
+namespace meta {
+
+${meta_function_declarations}
+
+} // namespace native
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeMetaFunctions.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeMetaFunctions.h
new file mode 100644
index 0000000000000000000000000000000000000000..89989e2121c9aa34a4583205c3541a04edd36700
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/NativeMetaFunctions.h
@@ -0,0 +1,19 @@
+#pragma once
+
+// ${generated_comment}
+
+#include <ATen/core/Tensor.h>
+#include <ATen/core/IListRef.h>
+#include <ATen/TensorMeta.h>
+#include <ATen/TensorIterator.h>
+
+${NativeMetaFunctions_includes}
+
+namespace at {
+
+namespace meta {
+
+${NativeMetaFunctions_declarations}
+
+} // namespace meta
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Operator.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Operator.h
new file mode 100644
index 0000000000000000000000000000000000000000..ed220f917290c2062481eb53dca232b47d180e2d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Operator.h
@@ -0,0 +1,19 @@
+#pragma once
+
+// ${generated_comment}
+
+#include <string_view>
+#include <tuple>
+#include <vector>
+
+// Forward declarations of any types needed in the operator signatures.
+// We can't directly include these classes because it will cause circular include dependencies.
+// This file is included by TensorBody.h, which defines the Tensor class.
+#include <ATen/core/ATen_fwd.h>
+
+namespace at {
+namespace _ops {
+
+${declarations}
+
+}} // namespace at::_ops
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Operators.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Operators.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..082bb67c3e2043f2c36b29345f57048ec2e9eea7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Operators.cpp
@@ -0,0 +1,19 @@
+#include <ATen/Tensor.h>
+#include <ATen/core/dispatch/Dispatcher.h>
+
+// ${generated_comment}
+// NOTE See [Sharded File] comment in VariableType
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Operators.h>
+#else
+${operator_headers}
+#endif
+
+${static_dispatch_extra_headers}
+
+namespace at { namespace _ops {
+
+${definitions}
+
+}} // namespace at::_ops
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Operators.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Operators.h
new file mode 100644
index 0000000000000000000000000000000000000000..e74b96ef3d5c6b6d50fe63eac4dca51f0655daa5
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/Operators.h
@@ -0,0 +1,74 @@
+#pragma once
+
+// ${generated_comment}
+
+#ifdef TORCH_ASSERT_NO_OPERATORS
+#error This change adds a dependency on native_functions.yaml,             \
+  meaning the file will need to be re-compiled every time an operator      \
+  is changed or added. Consider if your change would be better placed in   \
+  another file, or if a more specific header might achieve the same goal.  \
+  See NOTE: [Tensor vs. TensorBase]
+#endif
+
+#if defined(AT_PER_OPERATOR_HEADERS) && defined(TORCH_ASSERT_ONLY_METHOD_OPERATORS)
+#error This change adds a dependency on all pytorch operators, meaning the     \
+  file will need to be re-compiled every time an operator is changed or added. \
+  Consider including a specific operator from <ATen/ops/{my_operator}_ops.h>   \
+  and see NOTE [TORCH_ASSERT_ONLY_METHOD_OPERATORS].
+#endif
+
+#include <c10/core/SymInt.h>
+#include <c10/core/SymIntArrayRef.h>
+#include <c10/core/Scalar.h>
+#include <c10/core/TensorOptions.h>
+#include <c10/core/QScheme.h>
+#include <c10/util/OptionalArrayRef.h>
+#include <tuple>
+#include <vector>
+
+${Operators_includes}
+
+// Extension writers: do you write wrapper functions? Are you frustrated with
+// resolving overloads of operators? Are you frustrated with dealing with
+// pointer-to-methods and resolving overloads of pointer-to-methods?? Look no
+// further, this is the utility for you.
+//
+// Given an operator schema: aten::op.overload(...
+//
+// Use ATEN_FN2(op, overload) to get a *function* version of the operator
+// that is guaranteed to not be overloaded. This means that you can safely
+// decltype(&ATEN_FN2(op, overload)) it. NB: the 2 means this macro takes 2 args.
+//
+// Given an operator schema without an overload name: aten::op(...
+//
+// Use ATEN_FN(op) to get an unambiguous *function* version of the operator.
+//
+// There is some interesting behavior for out= operations.
+// ATEN_FN2(sin, out) gives a function that is *faithful* to the schema;
+// that is, the order of arguments is exactly what it looks like in the schema.
+
+#define ATEN_FN2(op_name, overload) at::_ops::op_name##_##overload::call
+#define ATEN_FN(op_name) at::_ops::op_name::call
+
+// Separately, ATEN_OP(op) and ATEN_OP2(op, overload) define a class containing compile-time
+// metadata about a given aten operator.
+// Notable data on the class includes:
+// - ATEN_OP2(add, Tensor)::name // returns the string name: "add"
+// - ATEN_OP2(add, Tensor)::overload_name // returns the string overload name: "Tensor"
+// - ATEN_OP2(add, Tensor)::schema // returns the C++ schema type: at::Tensor (const at::Tensor &, const at::Tensor &, const at::Scalar &)
+// - ATEN_OP2(add, Tensor)::schema_str // returns the string jit type: "add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"
+
+#define ATEN_OP2(op_name, overload) at::_ops::op_name##_##overload
+#define ATEN_OP(op_name) at::_ops::op_name
+
+// WARNING: Please do not call any of the ops in the _ops namespace directly.
+// Use the ATEN_FN macros. We do not guarantee stability of the naming
+// scheme for the functions in at::_ops
+
+// See Note [The ATen Operators API] for details of the at::_ops namespace
+
+namespace at {
+namespace _ops {
+${Operators_declarations}
+} // namespace _ops
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RedispatchFunctions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RedispatchFunctions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..58102bd97fca4eaef477818b0b0a92b7995e38b1
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RedispatchFunctions.cpp
@@ -0,0 +1,15 @@
+// ${generated_comment}
+
+#include <ATen/RedispatchFunctions.h>
+#include <ATen/Functions.h>
+
+#include <ATen/core/dispatch/Dispatcher.h>
+#include <ATen/core/op_registration/adaption.h>
+
+namespace at {
+
+namespace redispatch {
+    ${function_redispatch_definitions}
+} // namespace redispatch
+
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RedispatchFunctions.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RedispatchFunctions.h
new file mode 100644
index 0000000000000000000000000000000000000000..2422cdd409cfdd59c2a05df27d28bb25ee610463
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RedispatchFunctions.h
@@ -0,0 +1,32 @@
+#pragma once
+
+// ${generated_comment}
+
+#ifdef TORCH_ASSERT_ONLY_METHOD_OPERATORS
+#error This change adds a dependency on all pytorch operators, meaning the     \
+  file will need to be re-compiled every time an operator is changed or added. \
+  Consider using the at::_ops::{name}::redispatch() interface by including     \
+  the specific operator from <ATen/ops/{my_operator}_ops.h>
+#endif
+
+#include <c10/core/Scalar.h>
+#include <ATen/Tensor.h>
+#include <c10/core/Storage.h>
+#include <ATen/core/Generator.h>
+#include <c10/util/Deprecated.h>
+#include <ATen/DeviceGuard.h>
+#include <c10/core/TensorOptions.h>
+#include <ATen/core/Reduction.h>
+#include <optional>
+#include <ATen/TensorUtils.h>
+#include <ATen/Context.h>
+#include <ATen/TracerMode.h>
+#include <ATen/Operators.h>
+
+namespace at {
+
+namespace redispatch {
+    ${function_redispatch_definitions}
+} // namespace redispatch
+
+}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterBackendSelect.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterBackendSelect.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..018cf358f11237d5bdc9bca01aa8d09d1462f574
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterBackendSelect.cpp
@@ -0,0 +1,29 @@
+// We register ops with a higher priority dispatch key (BackendSelect) than the usual backend-specific keys (e.g. CPU)
+// which makes calls to the factory functions dispatch to here.
+// We then 'manually' compute a lower-priority to re-dispatch to (e.g. CPU) to get to the eventually correct backend.
+// ${generated_comment}
+
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+#include <ATen/core/Tensor.h>
+#include <ATen/core/dispatch/DispatchKeyExtractor.h>
+#include <torch/library.h>
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Operators.h>
+#else
+
+${ops_headers}
+#endif
+
+namespace at {
+
+namespace {
+
+${backend_select_method_definitions}
+
+TORCH_LIBRARY_IMPL(aten, BackendSelect, m) {
+  ${backend_select_function_registrations};
+}
+
+} // namespace
+} // at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterCodegenUnboxedKernels.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterCodegenUnboxedKernels.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..279f987c66a26c2eb5d11c664c85b3604b67684b
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterCodegenUnboxedKernels.cpp
@@ -0,0 +1,41 @@
+#include <torch/csrc/jit/runtime/operator.h>
+#include <torch/csrc/jit/runtime/custom_operator.h>
+#include <torch/csrc/jit/runtime/register_ops_utils.h>
+
+#include <ATen/UnboxingFunctions.h>
+
+// ${generated_comment}
+
+// NOTE [Sharded File]: This file is generated in a sharded fashion to speed up
+// incremental rebuilds. See the comment at the top of
+// templates/VariableType.cpp for an analogous, in-depth discussion.
+//
+// Generated by tools/jit/gen_unboxing.py. This file registers all ATen ops into JIT op registry instead of c10
+// dispatcher. JIT op registry only takes boxed kernels, so we are calling unboxing functions in UnboxingFunctions.h
+// to cast arguments into C++ types (instead of IValue) and delegate to unboxed kernels.
+
+namespace torch { namespace jit {
+
+using autograd::Variable;
+using autograd::variable_list;
+using at::Scalar;
+using at::ScalarType;
+using at::Tensor;
+using at::TensorOptions;
+using at::DeviceGuard;
+
+using ::c10::fmap;
+using ::c10::filter;
+
+namespace {
+
+RegisterOperators reg({
+
+    // Generated operators
+    ${unboxed_ops}
+});
+
+} // anon namespace
+
+
+}} // namespace torch::jit
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterDispatchDefinitions.ini b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterDispatchDefinitions.ini
new file mode 100644
index 0000000000000000000000000000000000000000..97c921de18f62832d1ca09c245f2466541fe908d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterDispatchDefinitions.ini
@@ -0,0 +1,22 @@
+${ns_prologue}
+
+// NB: TORCH_LIBRARY_IMPL must be in an anonymous namespace to avoid
+// ambiguity with conflicting identifiers that may have been defined in
+// at namespace already.
+namespace {
+
+${dispatch_anonymous_definitions}
+
+${static_init_dispatch_registrations}
+
+} // anonymous namespace
+
+${deferred_dispatch_registrations}
+
+namespace ${dispatch_namespace} {
+
+${dispatch_namespaced_definitions}
+
+} // namespace ${dispatch_namespace}
+
+${ns_epilogue}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterDispatchKey.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterDispatchKey.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..39c85b00d7a1be5471b496b7871aae825b39df9e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterDispatchKey.cpp
@@ -0,0 +1,52 @@
+// an external backend might generate file within its code tree
+// and check all the source files within the tree with clang-format.
+// so, disable it since the backend might have a different config.
+// clang-format off
+
+// NOTE: This condition is true for all PyTorch internal libraries, it
+//       just excludes external projects such as torch_xla which
+//       reuse some of the PyTorch codegen machinery.
+#if defined(CAFFE2_BUILD_MAIN_LIB)        || \
+    defined(TORCH_CUDA_BUILD_MAIN_LIB)    || \
+    defined(TORCH_HIP_BUILD_MAIN_LIB)     || \
+    defined(TORCH_XPU_BUILD_MAIN_LIB)
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+#endif
+
+// ${generated_comment}
+
+#include <c10/core/TensorImpl.h>
+#include <c10/core/Allocator.h>
+#include <ATen/DeviceGuard.h>
+#include <ATen/NamedTensorUtils.h>
+#include <ATen/Utils.h>
+#include <ATen/WrapDimUtils.h>
+#include <ATen/Dispatch.h>
+#include <c10/util/ExclusivelyOwned.h>
+#include <c10/util/Half.h>
+#include <c10/core/UndefinedTensorImpl.h>
+#include <optional>
+#include <ATen/Tensor.h>
+#include <ATen/native/Resize.h>
+
+#include <cstddef>
+#include <functional>
+#include <memory>
+#include <utility>
+
+#include <ATen/Config.h>
+#include <ATen/core/op_registration/adaption.h>
+#include <torch/library.h>
+$extra_cuda_headers
+$external_backend_headers
+$dispatch_headers
+$ops_headers
+
+namespace at {
+namespace {
+$dispatch_helpers
+} // namespace
+} // namespace at
+
+// See template file RegisterDispatchDefinitions.ini
+$dispatch_definitions
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterFunctionalization.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterFunctionalization.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..408aff0cdab40461a7ba731bab216a7b7435331e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterFunctionalization.cpp
@@ -0,0 +1,116 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+#include <ATen/core/LegacyTypeDispatch.h>
+#include <ATen/EmptyTensor.h>
+#include <ATen/FunctionalTensorWrapper.h>
+#include <ATen/ViewMetaClasses.h>
+#include <ATen/MemoryOverlap.h>
+#include <torch/library.h>
+
+#include <c10/util/env.h>
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Operators.h>
+#include <ATen/NativeFunctions.h>
+#else
+// needed for the meta tensor calls to get stride info in functionalization
+#include <ATen/ops/empty_strided_native.h>
+// needed for special handling of copy_().
+// See Note [functionalizating copy_() and not preserving strides]
+#include <ATen/ops/to_ops.h>
+#include <ATen/ops/expand_copy_ops.h>
+
+$ops_headers
+#endif
+
+namespace at {
+namespace functionalization {
+
+// This keyset is used by functionalization when it calls into meta kernels
+// to accurately propagate stride metadata.
+// Exclude any modes: the purpose of calling into meta kernels is only as an implementation
+// detail to perform shape inference, and we don't want any modal keys to run.
+// Specifically, we want to prevent functionalization and Python modes from running.
+constexpr auto exclude_keys_for_meta_dispatch =
+    c10::functorch_transforms_ks |
+    c10::DispatchKeySet({
+        c10::DispatchKey::FuncTorchDynamicLayerBackMode,
+        c10::DispatchKey::FuncTorchDynamicLayerFrontMode,
+        c10::DispatchKey::Python,
+        c10::DispatchKey::PreDispatch,
+
+    });
+
+// Helper around at::has_internal_overlap.
+// The ATen util is used in hot-path eager mode: it's always fast,
+// but might return TOO_HARD sometimes.
+// During functionalization, we're ok taking a bit longer
+// to detect memory overlap.
+inline bool has_internal_overlap_helper(const at::Tensor t) {
+  auto has_overlap = at::has_internal_overlap(t);
+  if (has_overlap == at::MemOverlap::Yes) return true;
+  if (has_overlap == at::MemOverlap::No) return false;
+  return false;
+}
+
+
+inline Tensor to_meta(const Tensor& t) {
+    if (!t.defined()) return t;
+    return at::native::empty_strided_meta_symint(t.sym_sizes(), t.sym_strides(),
+/*dtype=*/t.scalar_type(), /*layout=*/t.layout(),
+/*device=*/c10::Device(kMeta), /*pin_memory=*/std::nullopt);
+}
+
+inline std::optional<Tensor> to_meta(const std::optional<Tensor>& t) {
+  if (t.has_value()) {
+    return to_meta(*t);
+  }
+  return std::nullopt;
+}
+
+inline std::vector<Tensor> to_meta(at::ITensorListRef t_list) {
+  std::vector<Tensor> outputs;
+  outputs.reserve(t_list.size());
+  for (const auto& tensor : t_list) {
+    outputs.push_back(to_meta(tensor));
+  }
+  return outputs;
+}
+
+inline c10::List<Tensor> to_meta(const c10::List<Tensor>& t_list) {
+  c10::List<Tensor> outputs;
+  outputs.reserve(t_list.size());
+  for (const auto i : c10::irange(t_list.size())) {
+    outputs.push_back(to_meta(t_list[i]));
+  }
+  return outputs;
+}
+
+inline c10::List<::std::optional<Tensor>> to_meta(const c10::List<::std::optional<Tensor>>& t_list) {
+  c10::List<::std::optional<Tensor>> outputs;
+  outputs.reserve(t_list.size());
+  for (const auto i : c10::irange(t_list.size())) {
+    outputs.push_back(to_meta(t_list[i]));
+  }
+  return outputs;
+}
+
+static bool disable_meta_reference() {
+  static auto env = c10::utils::get_env("TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE");
+  return env == "1";
+}
+
+
+${func_definitions}
+
+}  // namespace functionalization
+
+namespace {
+
+TORCH_LIBRARY_IMPL(aten, Functionalize, m) {
+  ${func_registrations};
+}
+
+}  // namespace
+
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterSchema.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterSchema.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..029796d3e575b2bde85cfd44af9e6fcbb56466cd
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegisterSchema.cpp
@@ -0,0 +1,13 @@
+// ${generated_comment}
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+#include <torch/library.h>
+
+namespace at {
+TORCH_LIBRARY(aten, m) {
+  ${aten_schema_registrations};
+  // Distributed Ops
+  // Implementations located in torch/csrc/jit/runtime/register_distributed_ops.cpp
+  m.def("get_gradients(int context_id) -> Dict(Tensor, Tensor)");
+}
+${schema_registrations}
+}  // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegistrationDeclarations.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegistrationDeclarations.h
new file mode 100644
index 0000000000000000000000000000000000000000..5a0f0d0c7b44dabb60061d32ced243fe607069d8
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/RegistrationDeclarations.h
@@ -0,0 +1,4 @@
+// This file contains all native_functions that can be registered to
+// and the schema string that they should be registered with
+
+${registration_declarations}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/TensorBody.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/TensorBody.h
new file mode 100644
index 0000000000000000000000000000000000000000..ba3490bb1b0711c19dc118fcf1bd5e0d9c7e2f03
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/TensorBody.h
@@ -0,0 +1,756 @@
+#pragma once
+
+#ifdef TORCH_ASSERT_NO_OPERATORS
+#error This change adds a dependency on native_functions.yaml,            \
+  meaning the file will need to be re-compiled every time an operator     \
+  is changed or added. Consider if your change would be better placed in  \
+  another file, or if a more specific header might achieve the same goal. \
+  See NOTE: [Tensor vs. TensorBase]
+#endif
+
+#include <c10/core/Device.h>
+#include <c10/core/Layout.h>
+#include <c10/core/MemoryFormat.h>
+#include <c10/core/QScheme.h>
+#include <c10/core/Stream.h>
+#include <c10/core/Scalar.h>
+#include <c10/core/ScalarType.h>
+#include <c10/core/ScalarTypeToTypeMeta.h>
+#include <c10/core/Storage.h>
+#include <c10/core/TensorImpl.h>
+#include <c10/core/UndefinedTensorImpl.h>
+#include <c10/core/WrapDimMinimal.h>
+#include <c10/util/Exception.h>
+#include <c10/util/ExclusivelyOwned.h>
+#include <c10/util/Deprecated.h>
+#include <c10/util/MaybeOwned.h>
+#include <optional>
+#include <c10/util/OptionalArrayRef.h>
+#include <c10/util/intrusive_ptr.h>
+#include <c10/macros/Export.h>
+#include <c10/macros/Macros.h>
+#include <ATen/core/CheckMemoryFormat.h>
+#include <ATen/core/DeprecatedTypePropertiesRegistry.h>
+#include <ATen/core/DeprecatedTypeProperties.h>
+#include <ATen/core/NamedTensor.h>
+#include <ATen/core/QuantizerBase.h>
+#include <c10/core/SymInt.h>
+#include <ATen/core/TensorAccessor.h>
+#include <ATen/core/TensorBase.h>
+
+
+#include <ATen/MethodOperators.h>
+
+namespace c10{
+template<class T> class List;
+template<class T> class IListRef;
+}
+namespace at {
+struct Generator;
+struct Type;
+class DeprecatedTypeProperties;
+class Tensor;
+} // namespace at
+namespace at {
+namespace indexing {
+struct TensorIndex;
+} // namespace indexing
+} // namespace at
+
+namespace torch { namespace autograd {
+
+struct Node;
+
+}} // namespace torch::autograd
+
+namespace at {
+
+class OptionalTensorRef;
+class TensorRef;
+class Tensor;
+using TensorList = ArrayRef<Tensor>;
+using ITensorList = c10::IListRef<Tensor>;
+
+using Stream = c10::Stream;
+
+// Tensor is a "generic" object holding a pointer to the underlying TensorImpl object, which
+// has an embedded reference count. In this way, Tensor is similar to boost::intrusive_ptr.
+//
+// For example:
+//
+// void func(Tensor a) {
+//   Tensor b = a;
+//   ...
+// }
+//
+// In this example, when we say Tensor b = a, we are creating a new object that points to the
+// same underlying TensorImpl, and bumps its reference count. When b goes out of scope, the
+// destructor decrements the reference count by calling release() on the TensorImpl it points to.
+// The existing constructors, operator overloads, etc. take care to implement the correct semantics.
+//
+// Note that Tensor can also be NULL, i.e. it is not associated with any underlying TensorImpl, and
+// special care must be taken to handle this.
+class TORCH_API Tensor: public TensorBase {
+ protected:
+  // Create a Tensor with a +0 reference count. Special care must be
+  // taken to avoid decrementing this reference count at destruction
+  // time. Intended to support MaybeOwnedTraits<Tensor>.
+  explicit Tensor(unsafe_borrow_t, const TensorBase& rhs): TensorBase(unsafe_borrow_t{}, rhs) {}
+  friend MaybeOwnedTraits<Tensor>;
+  friend OptionalTensorRef;
+  friend TensorRef;
+
+ public:
+  Tensor() = default;
+  // This constructor should not be used by end users and is an implementation
+  // detail invoked by autogenerated code.
+  explicit Tensor(
+      c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl> tensor_impl)
+      : TensorBase(std::move(tensor_impl)) {}
+  Tensor(const Tensor &tensor) = default;
+  Tensor(Tensor &&tensor) = default;
+
+  // Implicitly move-constructible from TensorBase, but must be explicit to increase refcount
+  explicit Tensor(const TensorBase &base): TensorBase(base) {}
+  /*implicit*/ Tensor(TensorBase &&base): TensorBase(std::move(base)) {}
+
+  // Creates a new wrapper from TensorImpl. Intentionally a free method because
+  // it should be used with care. Checks necessary invariants
+  static Tensor wrap_tensor_impl(
+      c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl> tensor_impl) {
+    return TensorBase::wrap_tensor_impl(std::move(tensor_impl));
+  }
+
+  Tensor contiguous(MemoryFormat memory_format=MemoryFormat::Contiguous) const {
+    return TensorBase::contiguous(memory_format);
+  }
+
+  Tensor conj() const {
+    if (!this->is_complex()) {
+      return *this;
+    }
+
+    C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED("-Wswitch-enum")
+    switch (this->layout()) {
+      case at::kSparse:
+      case at::kSparseCsr:
+      case at::kSparseCsc:
+      case at::kSparseBsr:
+      case at::kSparseBsc:
+        return this->conj_physical();
+      default:
+        return this->_conj();
+    }
+    C10_DIAGNOSTIC_POP()
+  }
+
+  // Aliased by Dimname overloads, so need explicit using
+  using TensorBase::size;
+  using TensorBase::sym_size;
+  using TensorBase::stride;
+
+  /// Should be used if *this can reasonably be expected to be contiguous and
+  /// performance is important.
+  /// Compared to contiguous, it saves a reference count
+  /// increment/decrement if *this is already contiguous, at the cost
+  /// in all cases of an extra pointer of stack usage, an extra branch
+  /// to access, and an extra branch at destruction time.
+  c10::MaybeOwned<Tensor> expect_contiguous(MemoryFormat memory_format=MemoryFormat::Contiguous) const &;
+
+  // Use .contiguous() instead. Trying to borrow from a prvalue Tensor
+  // will only lead to trouble and dangling references.
+  c10::MaybeOwned<Tensor> expect_contiguous(MemoryFormat memory_format=MemoryFormat::Contiguous) && = delete;
+
+  // The following overloads are very intriguing.  Consider the following
+  // program:
+  //
+  //    x[1] = 3;
+  //
+  // We would expect that the first entry of x is written to 3.  But how can we
+  // actually achieve this?  x[1] evaluates to a tensor...
+  //
+  // The answer is, using a ref-qualifier.  x[1] is an rvalue, which cannot be
+  // (profitably) assigned to in the traditional sense, so we overload
+  // assignment to mean, "Actually, copy 3 into the tensor data."  This is done
+  // with an rvalue-reference ref-qualified overload (the methods with && at the
+  // end of their type.)
+  //
+  // There's one more fly in the ointment: We also want
+  //
+  //    Tensor x = y;
+  //
+  // to work, and we want it NOT to copy.  So we need a traditional operator=
+  // overload.  But we MUST specify a mutable lvalue ref-qualifier, to
+  // disambiguate the traditional overload from the rvalue-reference
+  // ref-qualified overload.  Otherwise, it will be ambiguous, because
+  // a non ref-qualified method is eligible for all situations.
+
+  // Unfortunately, we have to write these constructors out manually
+  // to work around an MSVC bug:
+  //    error C2580: 'at::Tensor &at::Tensor::operator =(const at::Tensor &) &':
+  //    multiple versions of a defaulted special member functions are not allowed
+  // Tensor& operator=(const Tensor&) & = default;
+  // Tensor& operator=(Tensor&&) & = default;
+
+  // Also MSVC will wrongly issue the following warning with the aforementioned fix
+  //    warning C4522: 'at::Tensor': multiple assignment operators specified
+  // Let's just skip the warning.
+  //
+  // TODO: temporarily disabled
+
+  Tensor& operator=(const TensorBase& x) & noexcept {
+    impl_ = x.getIntrusivePtr();
+    return *this;
+  }
+  Tensor& operator=(TensorBase&& x) & noexcept {
+    impl_ = x.unsafeReleaseIntrusivePtr();
+    return *this;
+  }
+
+  Tensor& operator=(const Tensor &x) & noexcept {
+    return operator=(static_cast<const TensorBase&>(x));
+  }
+  Tensor& operator=(Tensor &&x) & noexcept {
+    return operator=(static_cast<TensorBase&&>(x));
+  }
+
+  Tensor& operator=(const Scalar &v) && {
+    return fill_(v);
+  }
+  Tensor& operator=(const Tensor &rhs) && {
+    return copy_(rhs);
+  }
+  Tensor& operator=(Tensor&& rhs) && {
+    return copy_(rhs);
+  }
+
+  C10_DEPRECATED_MESSAGE("Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device().")
+  DeprecatedTypeProperties & type() const {
+    return globalDeprecatedTypePropertiesRegistry().getDeprecatedTypeProperties(
+        dispatchKeyToBackend(legacyExtractDispatchKey(key_set())),
+        scalar_type());
+  }
+
+  Tensor toType(ScalarType t) const {
+    return to(options().dtype(t), /*non_blocking*/ false, /*copy*/ false);
+  }
+
+  // TODO: Deprecate me
+  Tensor toBackend(Backend b) const {
+    return to(options().device(backendToDeviceType(b)).layout(layout_from_backend(b)), /*non_blocking*/ false, /*copy*/ false);
+  }
+
+  C10_DEPRECATED_MESSAGE("Tensor.is_variable() is deprecated; everything is a variable now. (If you want to assert that variable has been appropriately handled already, use at::impl::variable_excluded_from_dispatch())")
+  bool is_variable() const noexcept {
+    return !at::impl::variable_excluded_from_dispatch();
+  }
+
+  template<typename T>
+  C10_DEPRECATED_MESSAGE("Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead.")
+  T * data() const {
+    return data_ptr<T>();
+  }
+
+  template <typename T>
+  T item() const;
+
+  template<typename T, size_t N, template <typename U> class PtrTraits = DefaultPtrTraits, typename index_t = int64_t>
+  C10_DEPRECATED_MESSAGE("packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead")
+  GenericPackedTensorAccessor<T,N,PtrTraits,index_t> packed_accessor() const & {
+    return generic_packed_accessor<T,N,PtrTraits,index_t>();
+  }
+  template<typename T, size_t N, template <typename U> class PtrTraits = DefaultPtrTraits, typename index_t = int64_t>
+  C10_DEPRECATED_MESSAGE("packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead")
+  GenericPackedTensorAccessor<T,N,PtrTraits,index_t> packed_accessor() && = delete;
+
+  Tensor operator~() const {
+    return bitwise_not();
+  }
+  Tensor operator-() const {
+    return neg();
+  }
+  Tensor& operator+=(const Tensor & other) {
+    return add_(other);
+  }
+  Tensor& operator+=(const Scalar & other) {
+    return add_(other);
+  }
+  Tensor& operator-=(const Tensor & other) {
+    return sub_(other);
+  }
+  Tensor& operator-=(const Scalar & other) {
+    return sub_(other);
+  }
+  Tensor& operator*=(const Tensor & other) {
+    return mul_(other);
+  }
+  Tensor& operator*=(const Scalar & other) {
+    return mul_(other);
+  }
+  Tensor& operator/=(const Tensor & other) {
+    return div_(other);
+  }
+  Tensor& operator/=(const Scalar & other) {
+    return div_(other);
+  }
+  Tensor& operator&=(const Tensor & other) {
+    return bitwise_and_(other);
+  }
+  Tensor& operator|=(const Tensor & other) {
+    return bitwise_or_(other);
+  }
+  Tensor& operator^=(const Tensor & other) {
+    return bitwise_xor_(other);
+  }
+  Tensor operator[](const Scalar & index) const {
+    if (!index.isIntegral(false)) {
+      TORCH_CHECK_INDEX(false, "Can only index tensors with integral scalars");
+    }
+    return this->operator[](index.toLong());
+  }
+  Tensor operator[](const Tensor & index) const {
+    // These properties are checked in the Scalar constructor, but we already
+    // check them here to provide more useful diagnostics for the user.
+    if (!index.defined()) {
+      TORCH_CHECK_INDEX(false, "Can only index with tensors that are defined");
+    }
+    if (index.dim() != 0) {
+      TORCH_CHECK_INDEX(false,
+                        "Can only index with tensors that are scalars (zero-dim)");
+    }
+    // The Scalar(Tensor) constructor is explicit, so we need to call it.
+    return this->operator[](index.item());
+  }
+  Tensor operator[](int64_t index) const {
+    return select(0, index);
+  }
+
+  Tensor index(ArrayRef<at::indexing::TensorIndex> indices) const;
+  Tensor index(std::initializer_list<at::indexing::TensorIndex> indices) const;
+
+  Tensor & index_put_(ArrayRef<at::indexing::TensorIndex> indices, Tensor const & rhs);
+  Tensor & index_put_(ArrayRef<at::indexing::TensorIndex> indices, const Scalar& v);
+  Tensor & index_put_(std::initializer_list<at::indexing::TensorIndex> indices, Tensor const & rhs);
+  Tensor & index_put_(std::initializer_list<at::indexing::TensorIndex> indices, const Scalar& v);
+
+  Tensor cpu() const {
+    return to(options().device(c10::DeviceType::CPU), /*non_blocking*/ false, /*copy*/ false);
+  }
+
+  // TODO: The Python version also accepts arguments
+  Tensor cuda() const {
+    return to(options().device(c10::DeviceType::CUDA), /*non_blocking*/ false, /*copy*/ false);
+  }
+
+  Tensor hip() const {
+    return to(options().device(c10::DeviceType::HIP), /*non_blocking*/ false, /*copy*/ false);
+  }
+
+  Tensor ve() const {
+    return to(options().device(c10::DeviceType::VE), /*non_blocking*/ false, /*copy*/ false);
+  }
+
+  Tensor vulkan() const {
+    return to(options().device(c10::DeviceType::Vulkan), /*non_blocking*/ false, /*copy*/ false);
+  }
+
+  Tensor metal() const {
+    return to(options().device(c10::DeviceType::Metal), /*non_blocking*/ false, /*copy*/ false);
+  }
+
+  Tensor meta() const {
+    return to(options().device(c10::DeviceType::Meta), /*non_blocking*/ false, /*copy*/ false);
+  }
+
+  // ~~~~~ Autograd API ~~~~~
+
+  /// \fn bool is_leaf() const;
+  ///
+  /// All Tensors that have `requires_grad()` which is ``false`` will be leaf Tensors by convention.
+  ///
+  /// For Tensors that have `requires_grad()` which is ``true``, they will be leaf Tensors if they were
+  /// created by the user. This means that they are not the result of an operation and so
+  /// `grad_fn()` is `nullptr`.
+  ///
+  /// Only leaf Tensors will have their `grad()` populated during a call to `backward()`.
+  /// To get `grad()` populated for non-leaf Tensors, you can use `retain_grad()`.
+  ///
+  /// Example:
+  /// @code
+  /// auto a = torch::rand(10, torch::requires_grad());
+  /// std::cout << a.is_leaf() << std::endl; // prints `true`
+  ///
+  /// auto b = torch::rand(10, torch::requires_grad()).to(torch::kCUDA);
+  /// std::cout << b.is_leaf() << std::endl; // prints `false`
+  /// // b was created by the operation that cast a cpu Tensor into a cuda Tensor
+  ///
+  /// auto c = torch::rand(10, torch::requires_grad()) + 2;
+  /// std::cout << c.is_leaf() << std::endl; // prints `false`
+  /// // c was created by the addition operation
+  ///
+  /// auto d = torch::rand(10).cuda();
+  /// std::cout << d.is_leaf() << std::endl; // prints `true`
+  /// // d does not require gradients and so has no operation creating it (that is tracked by the autograd engine)
+  ///
+  /// auto e = torch::rand(10).cuda().requires_grad_();
+  /// std::cout << e.is_leaf() << std::endl; // prints `true`
+  /// // e requires gradients and has no operations creating it
+  ///
+  /// auto f = torch::rand(10, torch::device(torch::kCUDA).requires_grad(true));
+  /// std::cout << f.is_leaf() << std::endl; // prints `true`
+  /// // f requires grad, has no operation creating it
+  /// @endcode
+
+  /// \fn void backward(const Tensor & gradient={}, std::optional<bool> retain_graph=std::nullopt, bool create_graph=false, std::optional<TensorList> inputs=std::nullopt) const;
+  ///
+  /// Computes the gradient of current tensor with respect to graph leaves.
+  ///
+  /// The graph is differentiated using the chain rule. If the tensor is
+  /// non-scalar (i.e. its data has more than one element) and requires
+  /// gradient, the function additionally requires specifying ``gradient``.
+  /// It should be a tensor of matching type and location, that contains
+  /// the gradient of the differentiated function w.r.t. this Tensor.
+  ///
+  /// This function accumulates gradients in the leaves - you might need to
+  /// zero them before calling it.
+  ///
+  /// \param gradient Gradient w.r.t. the
+  ///     tensor. If it is a tensor, it will be automatically converted
+  ///     to a Tensor that does not require grad unless ``create_graph`` is True.
+  ///     None values can be specified for scalar Tensors or ones that
+  ///     don't require grad. If a None value would be acceptable then
+  ///     this argument is optional.
+  /// \param retain_graph If ``false``, the graph used to compute
+  ///     the grads will be freed. Note that in nearly all cases setting
+  ///     this option to True is not needed and often can be worked around
+  ///     in a much more efficient way. Defaults to the value of
+  ///     ``create_graph``.
+  /// \param create_graph If ``true``, graph of the derivative will
+  ///     be constructed, allowing to compute higher order derivative
+  ///     products. Defaults to ``false``.
+  /// \param inputs Inputs w.r.t. which the gradient will be accumulated into
+  ///     ``at::Tensor::grad``. All other Tensors will be ignored. If not
+  ///     provided, the gradient is accumulated into all the leaf Tensors
+  ///     that were used to compute the current tensor.
+  ///     When inputs are provided and a given input is not a leaf,
+  ///     the current implementation will call its grad_fn (even though it is not strictly needed to get this gradients).
+  ///     It is an implementation detail on which the user should not rely.
+  ///     See https://github.com/pytorch/pytorch/pull/60521#issuecomment-867061780 for more details.
+  void backward(const Tensor & gradient={}, std::optional<bool> retain_graph=std::nullopt, bool create_graph=false, std::optional<TensorList> inputs=std::nullopt) const {
+    // NB: Adding this wrapper to _backward here because we'd like our
+    // 'backwards' api to accept the 'inputs' argument optionally. Since code gen
+    // currently does not support optional of TensorList our approach is to replace
+    // backward in native_functions.yaml with _backward and call it here instead.
+    if (inputs.has_value()) {
+      TORCH_CHECK(inputs.value().size() > 0, "'inputs' argument to backward cannot be empty")
+      this->_backward(inputs.value(), gradient, retain_graph, create_graph);
+    } else {
+      this->_backward({}, gradient, retain_graph, create_graph);
+    }
+  }
+
+  /// \fn Tensor detach() const;
+  ///
+  /// Returns a new Tensor, detached from the current graph.
+  /// The result will never require gradient.
+
+  /// \fn Tensor & detach_() const;
+  ///
+  /// Detaches the Tensor from the graph that created it, making it a leaf.
+  /// Views cannot be detached in-place.
+
+  /// \fn void retain_grad() const;
+  ///
+  /// Enables this Tensor to have their :attr:`grad` populated during
+  /// :func:`backward`. This is a no-op for leaf tensors.
+
+  /// \fn bool retains_grad() const;
+  ///
+  /// Is ``true`` if this Tensor is non-leaf and its :attr:`grad` is enabled to be
+  /// populated during :func:`backward`, ``false`` otherwise.
+
+  const Tensor& set_requires_grad(bool requires_grad) const {
+    TensorBase::set_requires_grad(requires_grad);
+    return *this;
+  }
+
+  /// Return a mutable reference to the gradient. This is conventionally
+  /// used as `t.grad() = x` to set a gradient to a completely new tensor.
+  /// Note that this function work with a non-const Tensor and is not
+  /// thread safe.
+  Tensor& mutable_grad() const {
+    return impl_->mutable_grad();
+  }
+
+  /// This function returns an undefined tensor by default and returns a defined tensor
+  /// the first time a call to `backward()` computes gradients for this Tensor.
+  /// The attribute will then contain the gradients computed and future calls
+  /// to `backward()` will accumulate (add) gradients into it.
+  const Tensor& grad() const {
+    const Tensor& maybe_grad = impl_->grad();
+    if (!is_leaf() && !retains_grad() && !maybe_grad.defined()) {
+      TORCH_WARN(
+        "The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad "
+        "attribute won't be populated during autograd.backward(). If you indeed want the .grad "
+        "field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. "
+        "If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor "
+        "instead. See github.com/pytorch/pytorch/pull/30531 for more information.");
+    }
+    return maybe_grad;
+  }
+
+  // The Forward AD API functions below are low level and are not to be used by end
+  // users who should use the API provided in torch/csrc/autograd.h
+
+  /// This function returns the forward gradient for this Tensor at the given level.
+  const Tensor& _fw_grad(uint64_t level) const {
+    return impl_->_fw_grad(level, *this);
+  }
+
+  /// This function can be used to set the value of the forward grad.
+  /// Note that the given new_grad might not be used directly if it has different
+  /// metadata (size/stride/storage offset) compared to this Tensor. In that case,
+  /// new_grad content will be copied into a new Tensor
+  void _set_fw_grad(const TensorBase& new_grad, uint64_t level, bool is_inplace_op) const {
+    impl_->_set_fw_grad(new_grad, *this, level, is_inplace_op);
+  }
+
+
+  // STOP.  Thinking of adding a method here, which only makes use
+  // of other ATen methods?  Define it in native_functions.yaml.
+
+  //example
+  //Tensor * add(Tensor & b);
+  ${tensor_method_declarations}
+
+  // Special C++ only overloads for std()-like functions (See gh-40287)
+  // These are needed because int -> bool conversion takes precedence over int -> IntArrayRef
+  // So, for example std(0) would select the std(unbiased=False) overload
+
+  Tensor var(int dim) const {
+    return var(IntArrayRef{dim});
+  }
+
+  Tensor std(int dim) const {
+    return std(IntArrayRef{dim});
+  }
+
+  // We changed .dtype() to return a TypeMeta in #12766. Ideally, we want the
+  // at::kDouble and its friends to be TypeMeta's, but that hasn't happened yet.
+  // Before that change, we make this method to maintain BC for C++ usage like
+  // `x.to(y.dtype)`.
+  // TODO: remove following two after at::kDouble and its friends are TypeMeta's.
+  inline Tensor to(caffe2::TypeMeta type_meta, bool non_blocking=false, bool copy=false) const {
+    return this->to(/*scalar_type=*/typeMetaToScalarType(type_meta), non_blocking, copy);
+  }
+  inline Tensor to(Device device, caffe2::TypeMeta type_meta, bool non_blocking=false, bool copy=false) const {
+    return this->to(device, /*scalar_type=*/typeMetaToScalarType(type_meta), non_blocking, copy);
+  }
+
+  template <typename F, typename... Args>
+  decltype(auto) m(F func, Args&&... params) const {
+    return func(*this, std::forward<Args>(params)...);
+  }
+
+  /// NOTE: This is similar to the legacy `.data()` function on `Variable`, and is intended
+  /// to be used from functions that need to access the `Variable`'s equivalent `Tensor`
+  /// (i.e. `Tensor` that shares the same storage and tensor metadata with the `Variable`).
+  ///
+  /// One notable difference with the legacy `.data()` function is that changes to the
+  /// returned `Tensor`'s tensor metadata (e.g. sizes / strides / storage / storage_offset)
+  /// will not update the original `Variable`, due to the fact that this function
+  /// shallow-copies the `Variable`'s underlying TensorImpl.
+  at::Tensor tensor_data() const {
+    return TensorBase::tensor_data();
+  }
+
+  /// NOTE: `var.variable_data()` in C++ has the same semantics as `tensor.data`
+  /// in Python, which create a new `Variable` that shares the same storage and
+  /// tensor metadata with the original `Variable`, but with a completely new
+  /// autograd history.
+  ///
+  /// NOTE: If we change the tensor metadata (e.g. sizes / strides /
+  /// storage / storage_offset) of a variable created from `var.variable_data()`, those
+  /// changes will not update the original variable `var`. In `.variable_data()`, we set
+  /// `allow_tensor_metadata_change_` to false to make such changes explicitly illegal,
+  /// in order to prevent users from changing metadata of `var.variable_data()`
+  /// and expecting the original variable `var` to also be updated.
+  at::Tensor variable_data() const {
+    return TensorBase::variable_data();
+  }
+
+  // Hooks
+  //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+  template <typename T>
+  using hook_return_void_t = std::enable_if_t<std::is_void<typename std::invoke_result_t<T&, Tensor>>::value, unsigned>;
+  template <typename T>
+  using hook_return_var_t = std::enable_if_t<std::is_same_v<typename std::invoke_result_t<T&, Tensor>, Tensor>, unsigned>;
+
+  /// Registers a backward hook.
+  ///
+  /// The hook will be called every time a gradient with respect to the Tensor is computed.
+  /// The hook should have one of the following signature:
+  /// ```
+  /// hook(Tensor grad) -> Tensor
+  /// ```
+  /// ```
+  /// hook(Tensor grad) -> void
+  /// ```
+  /// The hook should not modify its argument, but it can optionally return a new gradient
+  /// which will be used in place of `grad`.
+  ///
+  /// This function returns the index of the hook in the list which can be used to remove hook.
+  ///
+  /// Example:
+  /// @code
+  /// auto v = torch::tensor({0., 0., 0.}, torch::requires_grad());
+  /// auto h = v.register_hook([](torch::Tensor grad){ return grad * 2; }); // double the gradient
+  /// v.backward(torch::tensor({1., 2., 3.}));
+  /// // This prints:
+  /// // ```
+  /// //  2
+  /// //  4
+  /// //  6
+  /// // [ CPUFloatType{3} ]
+  /// // ```
+  /// std::cout << v.grad() << std::endl;
+  /// v.remove_hook(h);  // removes the hook
+  /// @endcode
+  template <typename T>
+  hook_return_void_t<T> register_hook(T&& hook) const;
+  template <typename T>
+  hook_return_var_t<T> register_hook(T&& hook) const;
+
+  // Variable methods
+  //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+  Tensor data() const {
+    return TensorBase::data();
+  }
+
+  void _backward(TensorList inputs, const std::optional<Tensor>& gradient, std::optional<bool> keep_graph, bool create_graph) const;
+
+  const Tensor& requires_grad_(bool _requires_grad=true) const {
+    TensorBase::requires_grad_(_requires_grad);
+    return *this;
+  }
+};
+
+namespace detail {
+// Helper creator for Tensor class which doesn't requires the users to pass
+// in an intrusive_ptr instead it just converts the argument passed to
+// requested intrusive_ptr type.
+template <typename T, typename... Args>
+Tensor make_tensor(Args&&... args) {
+  return Tensor(c10::make_intrusive<T>(std::forward<Args>(args)...));
+}
+
+} // namespace detail
+
+} // namespace at
+
+
+namespace at {
+${tensor_method_definitions}
+} // namespace at
+
+
+namespace c10 {
+template <>
+struct MaybeOwnedTraits<at::Tensor> {
+  using owned_type = at::Tensor;
+  using borrow_type = at::Tensor;
+
+  static borrow_type createBorrow(const owned_type& from) {
+    // NOTE: this can be implemented without the special
+    // unsafe_borrow_t Tensor constructor as
+    //
+    // return borrow_type(c10::intrusive_ptr<at::TensorImpl, at::UndefinedTensorImpl>::reclaim(from.unsafeGetTensorImpl()));
+    //
+    // but that hurts inlining due to the nullptr check in the
+    // Tensor(c10::intrusive_ptr<...>) constructor. We already know
+    // that from.impl_ isn't null because from is a valid Tensor, so
+    // we needn't do the check again. (using __builtin_assume can
+    // avoid this, but wouldn't be portable to MSVC.)
+    return borrow_type(borrow_type::unsafe_borrow_t{}, from);
+  }
+
+  static void assignBorrow(borrow_type& lhs, const borrow_type& rhs) {
+    lhs.unsafeReleaseTensorImpl();
+    // See above note: this can be implemented with public API
+    // similarly to createBorrow(), but that would hurt inlining.
+    lhs = borrow_type(borrow_type::unsafe_borrow_t{}, rhs);
+  }
+
+  static void destroyBorrow(borrow_type& toDestroy) {
+    toDestroy.unsafeReleaseTensorImpl(); // "leak" it, but it was already +0.
+  }
+
+  static const owned_type& referenceFromBorrow(const borrow_type& borrow) {
+    return borrow;
+  }
+
+  static const owned_type* pointerFromBorrow(const borrow_type& borrow) {
+    return &borrow;
+  }
+
+  static bool debugBorrowIsValid(const borrow_type& /*borrow*/) {
+    return true;
+  }
+};
+
+template <>
+struct ExclusivelyOwnedTraits<at::Tensor> {
+  using repr_type = at::Tensor;
+  using pointer_type = at::Tensor*;
+  using const_pointer_type = const at::Tensor*;
+
+  static repr_type nullRepr() {
+    return at::Tensor();
+  }
+
+  template <class... Args>
+  static repr_type createInPlace(Args&&... args) {
+    return at::Tensor(std::forward<Args>(args)...);
+  }
+
+  static repr_type moveToRepr(at::Tensor&& x) {
+    return std::move(x);
+  }
+
+  static void destroyOwned(at::Tensor& x) {
+    return ExclusivelyOwnedTraits<at::TensorBase>::destroyOwned(x);
+  }
+
+  static at::Tensor take(at::Tensor& x) {
+    return std::move(x);
+  }
+
+  static pointer_type getImpl(repr_type& x) {
+    return &x;
+  }
+
+  static const_pointer_type getImpl(const repr_type& x) {
+    return &x;
+  }
+};
+} // namespace c10
+
+namespace at {
+
+inline c10::MaybeOwned<Tensor> borrow_from_optional_tensor(
+    const std::optional<Tensor>& opt) {
+  return opt.has_value()
+    ? c10::MaybeOwned<Tensor>::borrowed(*opt)
+    : c10::MaybeOwned<Tensor>::owned(std::in_place);
+}
+
+inline c10::MaybeOwned<Tensor> Tensor::expect_contiguous(MemoryFormat memory_format) const & {
+  if (is_contiguous(memory_format)) {
+    return c10::MaybeOwned<Tensor>::borrowed(*this);
+  } else {
+    return c10::MaybeOwned<Tensor>::owned(__dispatch_contiguous(memory_format));
+  }
+}
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/TensorMethods.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/TensorMethods.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..0504dccc385c9f3ad6ae3755df21aee1f476939b
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/TensorMethods.cpp
@@ -0,0 +1,61 @@
+#include <c10/core/Scalar.h>
+#include <ATen/core/TensorBody.h>
+
+#include <string_view>
+
+namespace at {
+
+namespace {
+
+// Verifies the requested type is the same as the Tensor's type.
+void check_type(const TensorBase& tensor, ScalarType type, std::string_view type_name) {
+  TORCH_CHECK(
+      tensor.scalar_type() == type
+      || (isQIntType(tensor.scalar_type())
+          && toUnderlying(tensor.scalar_type()) == type),
+      "expected scalar type ", type_name, " but found ", tensor.scalar_type());
+}
+
+} // namespace
+
+#define DEFINE_CAST(T, name)                                         \
+   template <>                                                       \
+   TORCH_API const T* TensorBase::const_data_ptr() const {           \
+     check_type(*this, ScalarType::name, #name);                     \
+     return this->unsafeGetTensorImpl()->data_ptr_impl<T>();         \
+   }                                                                 \
+                                                                     \
+   template <>                                                       \
+   TORCH_API const T* TensorBase::const_data_ptr<const T>() const {  \
+     check_type(*this, ScalarType::name, #name);                     \
+     return this->unsafeGetTensorImpl()->data_ptr_impl<std::remove_const_t<T>>(); \
+   }                                                                 \
+                                                                     \
+   template <>                                                       \
+   TORCH_API T* TensorBase::mutable_data_ptr() const {               \
+     check_type(*this, ScalarType::name, #name);                     \
+     return this->unsafeGetTensorImpl()->mutable_data_ptr_impl<T>(); \
+   }                                                                 \
+                                                                     \
+   template <>                                                       \
+   TORCH_API T* TensorBase::data_ptr() const {                       \
+     return mutable_data_ptr<T>();                                   \
+   }                                                                 \
+
+ AT_FORALL_SCALAR_TYPES_WITH_COMPLEX(DEFINE_CAST)
+ AT_FORALL_QINT_TYPES(DEFINE_CAST)
+ DEFINE_CAST(uint16_t, UInt16)
+ DEFINE_CAST(uint32_t, UInt32)
+ DEFINE_CAST(uint64_t, UInt64)
+ #undef DEFINE_CAST
+
+ #define DEFINE_ITEM(T, name)      \
+   template <>                     \
+   TORCH_API T Tensor::item() const { \
+     return item().to##name();     \
+   }
+
+ AT_FORALL_SCALAR_TYPES_WITH_COMPLEX(DEFINE_ITEM)
+ #undef DEFINE_ITEM
+
+ } //namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UfuncCPU.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UfuncCPU.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..6b363a508907cc064e41794720657541fc28c301
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UfuncCPU.cpp
@@ -0,0 +1,19 @@
+#define TORCH_ASSERT_NO_OPERATORS
+
+#include <ATen/native/DispatchStub.h>
+#include <ATen/TensorIterator.h>
+#include <ATen/TensorMeta.h>
+
+namespace at {
+
+// NB: this is explicitly copied here (via codegen) rather than
+// included via NativeFunctions.h to avoid recompiling this file when
+// NativeFunctions.h changes
+namespace meta {
+${meta_declaration}
+}
+
+namespace native {
+${native_declaration}
+${native_definitions}
+}} // namespace at::native
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UfuncCPUKernel.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UfuncCPUKernel.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..0cac55664d6125287bdee0bd94c150462b81d5b9
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UfuncCPUKernel.cpp
@@ -0,0 +1,14 @@
+#define TORCH_ASSERT_NO_OPERATORS
+
+#include <ATen/native/ufunc/${name}.h>
+#include <ATen/native/DispatchStub.h>
+#include <ATen/TensorIterator.h>
+#include <ATen/native/cpu/Loops.h>
+#include <ATen/cpu/vec/vec.h>
+#include <ATen/Dispatch.h>
+#include <c10/core/Scalar.h>
+
+namespace at {
+namespace native {
+${native_definitions}
+}} // namespace at::native
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UfuncCUDA.cu b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UfuncCUDA.cu
new file mode 100644
index 0000000000000000000000000000000000000000..e75d82d9cc84bd8fddfd303f610412e5d0a98729
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UfuncCUDA.cu
@@ -0,0 +1,21 @@
+#define TORCH_ASSERT_NO_OPERATORS
+
+#include <ATen/native/ufunc/${name}.h>
+#include <ATen/Dispatch.h>
+#include <ATen/native/DispatchStub.h>
+#include <c10/core/Scalar.h>
+${cuda_headers}
+
+namespace at {
+
+// NB: this is explicitly copied here (via codegen) rather than
+// included via NativeFunctions.h to avoid recompiling this file when
+// NativeFunctions.h changes
+namespace meta {
+${meta_declaration}
+}
+
+namespace native {
+${native_declaration}
+${native_definitions}
+}} // namespace at::native
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UnboxingFunctions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UnboxingFunctions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..86c13235d8623964d734e743f5f15cf68a8df63c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UnboxingFunctions.cpp
@@ -0,0 +1,35 @@
+#include <ATen/UnboxingFunctions.h>
+#include <ATen/Functions.h>
+
+#include <ATen/Tensor.h>
+#include <ATen/core/functional.h>
+#include <ATen/core/interned_strings.h>
+#include <ATen/core/ivalue.h>
+#include <ATen/core/stack.h>
+
+#include <algorithm>
+#include <array>
+#include <cstddef>
+#include <cstring>
+#include <sstream>
+#include <stdexcept>
+#include <tuple>
+#include <unordered_map>
+#include <unordered_set>
+#include <utility>
+#include <vector>
+namespace at {
+namespace unboxing {
+
+using ::c10::fmap;
+using ::c10::filter;
+using torch::jit::peek;
+using torch::jit::drop;
+using torch::jit::pack;
+using torch::jit::pop;
+
+// Generated function declaration
+${definitions}
+
+} // namespace unboxing
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UnboxingFunctions.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UnboxingFunctions.h
new file mode 100644
index 0000000000000000000000000000000000000000..a65469a9b0123cbfd4075ff3c263276aa47f137f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/UnboxingFunctions.h
@@ -0,0 +1,32 @@
+// ${generated_comment}
+
+// Generated by tools/jit/gen_unboxing.py. This file declares code generated boxed C++ functions for operators,
+// base off of native_functions.yaml (or similar yaml file with the same syntax). The definition of such a boxed
+// function will pop out IValues from the stack then convert them into the correct C++ types based on given schema. This
+// unboxing logic is an alternative to template-based metaprogramming unboxing.
+
+#pragma once
+
+#include <ATen/ATen.h>
+namespace at {
+namespace unboxing {
+namespace {
+
+template<typename T, size_t N>
+std::array<T, N> as_array(const c10::List<c10::IValue>& list) {
+    std::array<T, N> res;
+    AT_ASSERT(list.size() == N);
+    std::vector<T> vec;
+    for (c10::IValue elem : list) {
+        vec.push_back(elem.to<T>());
+    }
+    std::copy(vec.begin(), vec.end(), res.begin());
+    return res;
+}
+}  // namespace <anonymous>
+using Stack = std::vector<c10::IValue>;
+// Generated function declaration
+${declarations}
+
+} // namespace unboxing
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ViewMetaClasses.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ViewMetaClasses.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..0fd53171935f9147ba54bcd39a886e2f4dda6b2f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ViewMetaClasses.cpp
@@ -0,0 +1,19 @@
+// ${generated_comment}
+
+#include <ATen/FunctionalInverses.h>
+#include <ATen/ViewMetaClasses.h>
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Operators.h>
+#include <ATen/NativeFunctions.h>
+#else
+${op_headers}
+#endif
+
+namespace at {
+namespace functionalization {
+
+${view_meta_implementations}
+
+} // namespace functionalization
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ViewMetaClasses.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ViewMetaClasses.h
new file mode 100644
index 0000000000000000000000000000000000000000..be2dee2a871b35258864377fbac83e3037108b2b
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ViewMetaClasses.h
@@ -0,0 +1,12 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+#include <ATen/FunctionalStorageImpl.h>
+
+namespace at {
+namespace functionalization {
+
+${view_meta_declarations}
+
+} // namespace functionalization
+} // namespace at
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ViewMetaClassesPythonBinding.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ViewMetaClassesPythonBinding.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..c784e5abe5c88dfb5bc418e60d48b28391274718
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/ViewMetaClassesPythonBinding.cpp
@@ -0,0 +1,11 @@
+#include <ATen/ViewMetaClasses.h>
+#include <torch/csrc/functionalization/Module.h>
+
+namespace torch::functionalization {
+
+void initGenerated(PyObject* module) {
+  auto functionalization = py::handle(module).cast<py::module>();
+  $view_meta_bindings
+}
+
+} // namespace torch::functionalization
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/aten_interned_strings.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/aten_interned_strings.h
new file mode 100644
index 0000000000000000000000000000000000000000..326d4622334a776f4f1f94fb49a70f2c53c7e6eb
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/aten_interned_strings.h
@@ -0,0 +1,22 @@
+#pragma once
+
+// ${generated_comment}
+
+#if defined(TORCH_ASSERT_NO_OPERATORS) || defined(TORCH_ASSERT_ONLY_METHOD_OPERATORS)
+#error This change adds a dependency on native_functions.yaml,          \
+  meaning the file will need to be re-compiled every time an operator   \
+  is changed or added. Consider if including <ATen/core/symbol.h> for   \
+  the c10::Symbol class would be sufficient, or if your change would be \
+  better placed in another file.
+#endif
+
+// ATen symbols correspond exactly to operators defined in ATen. Every
+// symbol here corresponds exactly to an ATen operation defined in
+// native_functions.yaml; attributes are in one-to-one correspondence
+// with their ATen name.
+
+#define FORALL_ATEN_BASE_SYMBOLS(_) \
+${aten_symbols}
+
+#define FORALL_ATTR_BASE_SYMBOLS(_) \
+${attr_symbols}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/enum_tag.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/enum_tag.h
new file mode 100644
index 0000000000000000000000000000000000000000..1320fbc28ab8f7d72655816292f49a4c9a9b727d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/ATen/templates/enum_tag.h
@@ -0,0 +1,10 @@
+#pragma once
+
+// ${generated_comment}
+
+namespace at {
+    // Enum of valid tags obtained from the entries in tags.yaml
+    enum class Tag {
+        ${enum_of_valid_tags}
+    };
+}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/BUILD.bazel b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/BUILD.bazel
new file mode 100644
index 0000000000000000000000000000000000000000..d1a0db360d230fe0f027c19869c6307f17010503
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/BUILD.bazel
@@ -0,0 +1,4 @@
+load("//:tools/bazel.bzl", "rules")
+load(":build.bzl", "define_targets")
+
+define_targets(rules = rules)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/README.md b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..bfa43899cc590959c2bfd74e38662ec03aaee3d6
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/README.md
@@ -0,0 +1,3 @@
+If you add a file to this directory, you **MUST** update
+`torch/CMakeLists.txt` and add the file as a dependency to
+the `add_custom_command` call.
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..98566bb4b6bf75648bbcd23111b8c62c30ef21fb
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/context.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/context.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..760852ae430347401706ec05ed693170ee4d9f61
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/context.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_annotated_fn_args.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_annotated_fn_args.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..a568cfb7268517f849197d74730b1cd99871670a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_annotated_fn_args.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_autograd.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_autograd.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..9fd9f78133f9aa7695fedd0e750b5cff6bbab84c
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_autograd.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_autograd_functions.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_autograd_functions.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..490652fd5b27249304223c25840a7e944776985e
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_autograd_functions.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_inplace_or_view_type.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_inplace_or_view_type.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..eecd96ef686be90345c12a56e23298e46fa1a011
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_inplace_or_view_type.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_python_functions.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_python_functions.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..d27b330b299c0c5151ffe59e11756211d09d9bd8
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_python_functions.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_trace_type.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_trace_type.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3c082f0f3f489e11d5623135f8a84a0faa832ad4
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_trace_type.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_variable_factories.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_variable_factories.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..89b9377558b351111db613c2475821103bff6e52
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_variable_factories.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_variable_type.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_variable_type.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..97f3dcbb417333736813ce886a8beb4b5a6bef03
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_variable_type.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_view_funcs.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_view_funcs.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..ac1fead63b7e0ad8475b862d690e0d690935b4d9
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/gen_view_funcs.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/load_derivatives.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/load_derivatives.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..da8252378a6f67cdfd9382575326541f693836fb
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/__pycache__/load_derivatives.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/build.bzl b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/build.bzl
new file mode 100644
index 0000000000000000000000000000000000000000..c5ddf7a20b800a714431fdc9feb57679783410f4
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/build.bzl
@@ -0,0 +1,20 @@
+def define_targets(rules):
+    rules.py_library(
+        name = "autograd",
+        srcs = rules.glob(["*.py"]),
+        data = rules.glob([
+            "*.yaml",
+            "templates/*",
+        ]),
+        visibility = ["//:__subpackages__"],
+        deps = [
+            rules.requirement("PyYAML"),
+            "//torchgen",
+        ],
+    )
+
+    rules.filegroup(
+        name = "deprecated_yaml",
+        srcs = ["deprecated.yaml"],
+        visibility = ["//:__subpackages__"],
+    )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/context.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/context.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ed4b2ee4d014be3dca01c3f2293b36b03b7880b
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/context.py
@@ -0,0 +1,31 @@
+import functools
+from collections.abc import Callable
+
+from torchgen.api.autograd import NativeFunctionWithDifferentiabilityInfo as NFWDI
+from torchgen.context import native_function_manager
+from torchgen.utils import T
+
+
+# Like tools.api.context.with_native_function, but for
+# NativeFunctionWithDifferentiabilityInfo.
+def with_native_function_with_differentiability_info(
+    func: Callable[[NFWDI], T],
+) -> Callable[[NFWDI], T]:
+    @functools.wraps(func)
+    def wrapper(f: NFWDI) -> T:
+        with native_function_manager(f.func):
+            return func(f)
+
+    return wrapper
+
+
+# Like the above but with an additional dispatch key string argument
+def with_native_function_with_differentiability_info_and_key(
+    func: Callable[[NFWDI, str], T],
+) -> Callable[[NFWDI, str], T]:
+    @functools.wraps(func)
+    def wrapper(f: NFWDI, key: str) -> T:
+        with native_function_manager(f.func):
+            return func(f, key)
+
+    return wrapper
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/deprecated.yaml b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/deprecated.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..52f7ec50b6ea15dae1c3308358997950d295c924
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/deprecated.yaml
@@ -0,0 +1,134 @@
+# Deprecated function signatures. These are exposed in Python, but not included
+# in the error message suggestions.
+
+- name: add(Tensor self, Scalar alpha, Tensor other) -> Tensor
+  aten: add(self, other, alpha)
+
+- name: add_(Tensor(a!) self, Scalar alpha, Tensor other) -> Tensor(a!)
+  aten: add_(self, other, alpha)
+
+- name: add(Tensor self, Scalar alpha, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  aten: add_out(out, self, other, alpha)
+
+- name: addbmm(Scalar beta, Tensor self, Scalar alpha, Tensor batch1, Tensor batch2) -> Tensor
+  aten: addbmm(self, batch1, batch2, beta, alpha)
+
+- name: addbmm_(Scalar beta, Tensor(a!) self, Scalar alpha, Tensor batch1, Tensor batch2) -> Tensor(a!)
+  aten: addbmm_(self, batch1, batch2, beta, alpha)
+
+- name: addbmm(Scalar beta, Tensor self, Scalar alpha, Tensor batch1, Tensor batch2, *, Tensor(a!) out) -> Tensor(a!)
+  aten: addbmm_out(out, self, batch1, batch2, beta, alpha)
+
+- name: addbmm(Scalar beta, Tensor self, Tensor batch1, Tensor batch2) -> Tensor
+  aten: addbmm(self, batch1, batch2, beta, 1)
+
+- name: addbmm_(Scalar beta, Tensor(a!) self, Tensor batch1, Tensor batch2) -> Tensor(a!)
+  aten: addbmm_(self, batch1, batch2, beta, 1)
+
+- name: addbmm(Scalar beta, Tensor self, Tensor batch1, Tensor batch2, *, Tensor(a!) out) -> Tensor(a!)
+  aten: addbmm_out(out, self, batch1, batch2, beta, 1)
+
+- name: addcdiv(Tensor self, Scalar value, Tensor tensor1, Tensor tensor2) -> Tensor
+  aten: addcdiv(self, tensor1, tensor2, value)
+
+- name: addcdiv_(Tensor(a!) self, Scalar value, Tensor tensor1, Tensor tensor2) -> Tensor(a!)
+  aten: addcdiv_(self, tensor1, tensor2, value)
+
+- name: addcdiv(Tensor self, Scalar value, Tensor tensor1, Tensor tensor2, *, Tensor(a!) out) -> Tensor(a!)
+  aten: addcdiv_out(out, self, tensor1, tensor2, value)
+
+- name: addcmul(Tensor self, Scalar value, Tensor tensor1, Tensor tensor2) -> Tensor
+  aten: addcmul(self, tensor1, tensor2, value)
+
+- name: addcmul_(Tensor(a!) self, Scalar value, Tensor tensor1, Tensor tensor2) -> Tensor(a!)
+  aten: addcmul_(self, tensor1, tensor2, value)
+
+- name: addcmul(Tensor self, Scalar value, Tensor tensor1, Tensor tensor2, *, Tensor(a!) out) -> Tensor(a!)
+  aten: addcmul_out(out, self, tensor1, tensor2, value)
+
+- name: addmm(Scalar beta, Tensor self, Scalar alpha, Tensor mat1, Tensor mat2) -> Tensor
+  aten: addmm(self, mat1, mat2, beta, alpha)
+
+- name: addmm_(Scalar beta, Tensor(a!) self, Scalar alpha, Tensor mat1, Tensor mat2) -> Tensor(a!)
+  aten: addmm_(self, mat1, mat2, beta, alpha)
+
+- name: addmm(Scalar beta, Tensor self, Scalar alpha, Tensor mat1, Tensor mat2, *, Tensor(a!) out) -> Tensor(a!)
+  aten: addmm_out(out, self, mat1, mat2, beta, alpha)
+
+- name: addmm(Scalar beta, Tensor self, Tensor mat1, Tensor mat2) -> Tensor
+  aten: addmm(self, mat1, mat2, beta, 1)
+
+- name: addmm_(Scalar beta, Tensor(a!) self, Tensor mat1, Tensor mat2) -> Tensor(a!)
+  aten: addmm_(self, mat1, mat2, beta, 1)
+
+- name: addmm(Scalar beta, Tensor self, Tensor mat1, Tensor mat2, *, Tensor(a!) out) -> Tensor(a!)
+  aten: addmm_out(out, self, mat1, mat2, beta, 1)
+
+- name: sspaddmm(Scalar beta, Tensor self, Scalar alpha, Tensor mat1, Tensor mat2) -> Tensor
+  aten: sspaddmm(self, mat1, mat2, beta, alpha)
+
+- name: sspaddmm(Scalar beta, Tensor self, Tensor mat1, Tensor mat2) -> Tensor
+  aten: sspaddmm(self, mat1, mat2, beta, 1)
+
+- name: addmv(Scalar beta, Tensor self, Scalar alpha, Tensor mat, Tensor vec) -> Tensor
+  aten: addmv(self, mat, vec, beta, alpha)
+
+- name: addmv_(Scalar beta, Tensor(a!) self, Scalar alpha, Tensor mat, Tensor vec) -> Tensor(a!)
+  aten: addmv_(self, mat, vec, beta, alpha)
+
+- name: addmv(Scalar beta, Tensor self, Scalar alpha, Tensor mat, Tensor vec, *, Tensor(a!) out) -> Tensor(a!)
+  aten: addmv_out(out, self, mat, vec, beta, alpha)
+
+- name: addmv(Scalar beta, Tensor self, Tensor mat, Tensor vec) -> Tensor
+  aten: addmv(self, mat, vec, beta, 1)
+
+- name: addmv_(Scalar beta, Tensor(a!) self, Tensor mat, Tensor vec) -> Tensor(a!)
+  aten: addmv_(self, mat, vec, beta, 1)
+
+- name: addmv(Scalar beta, Tensor self, Tensor mat, Tensor vec, *, Tensor(a!) out) -> Tensor(a!)
+  aten: addmv_out(out, self, mat, vec, beta, 1)
+
+- name: addr(Scalar beta, Tensor self, Scalar alpha, Tensor vec1, Tensor vec2) -> Tensor
+  aten: addr(self, vec1, vec2, beta, alpha)
+
+- name: addr_(Scalar beta, Tensor(a!) self, Scalar alpha, Tensor vec1, Tensor vec2) -> Tensor(a!)
+  aten: addr_(self, vec1, vec2, beta, alpha)
+
+- name: addr(Scalar beta, Tensor self, Scalar alpha, Tensor vec1, Tensor vec2, *, Tensor(a!) out) -> Tensor(a!)
+  aten: addr_out(out, self, vec1, vec2, beta, alpha)
+
+- name: addr(Scalar beta, Tensor self, Tensor vec1, Tensor vec2) -> Tensor
+  aten: addr(self, vec1, vec2, beta, 1)
+
+- name: addr_(Scalar beta, Tensor(a!) self, Tensor vec1, Tensor vec2) -> Tensor(a!)
+  aten: addr_(self, vec1, vec2, beta, 1)
+
+- name: addr(Scalar beta, Tensor self, Tensor vec1, Tensor vec2, *, Tensor(a!) out) -> Tensor(a!)
+  aten: addr_out(out, self, vec1, vec2, beta, 1)
+
+- name: baddbmm(Scalar beta, Tensor self, Scalar alpha, Tensor batch1, Tensor batch2) -> Tensor
+  aten: baddbmm(self, batch1, batch2, beta, alpha)
+
+- name: baddbmm_(Scalar beta, Tensor(a!) self, Scalar alpha, Tensor batch1, Tensor batch2) -> Tensor(a!)
+  aten: baddbmm_(self, batch1, batch2, beta, alpha)
+
+- name: baddbmm(Scalar beta, Tensor self, Scalar alpha, Tensor batch1, Tensor batch2, *, Tensor(a!) out) -> Tensor(a!)
+  aten: baddbmm_out(out, self, batch1, batch2, beta, alpha)
+
+- name: baddbmm(Scalar beta, Tensor self, Tensor batch1, Tensor batch2) -> Tensor
+  aten: baddbmm(self, batch1, batch2, beta, 1)
+
+- name: baddbmm_(Scalar beta, Tensor(a!) self, Tensor batch1, Tensor batch2) -> Tensor(a!)
+  aten: baddbmm_(self, batch1, batch2, beta, 1)
+
+- name: baddbmm(Scalar beta, Tensor self, Tensor batch1, Tensor batch2, *, Tensor(a!) out) -> Tensor(a!)
+  aten: baddbmm_out(out, self, batch1, batch2, beta, 1)
+
+- name: sub(Tensor self, Scalar alpha, Tensor other) -> Tensor
+  aten: sub(self, other, alpha)
+
+- name: sub_(Tensor(a!) self, Scalar alpha, Tensor other) -> Tensor(a!)
+  aten: sub_(self, other, alpha)
+
+- name: sub(Tensor self, Scalar alpha, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
+  aten: sub_out(out, self, other, alpha)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/derivatives.yaml b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/derivatives.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..88e0a316f9d09c49d7ec370cff912bba59c27136
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/derivatives.yaml
@@ -0,0 +1,3242 @@
+# Defines derivative formulas and Python signatures of methods on Variable
+#
+# Note about possibly confusing nomenclature: An 'output gradient' is the
+# gradient of an output of a forward function. Output gradients are used as
+# the inputs to backward functions. `grads` is a vector of output gradients,
+# and `grad == grads[0]`, in all the derivative formulas in this file.
+# An 'input gradient' is the gradient of an input to a forward function.
+# Input gradients are the outputs of backward functions, corresponding to the
+# input names included in the derivative formulas defined in this file.
+# Also, every time we talk computing "gradient" we actually mean computing
+# the vector jacobian product using the given 'output gradient' as the vector.
+#
+# Each entry consists of:
+#   - A 'name', which specifies the ATen name of the function you
+#     are defining derivatives for, and an argument specification.
+#   - An optional 'dispatch' entry which can be used to specify
+#     per-autograd dispatch key derivatives. If this entry is not
+#     specified, then the gradient entries will be taken as the
+#     default gradients (i.e. registered for every backward dispatch
+#     key). (see _test_autograd_multiple_dispatch for an example
+#     of how to register separate derivates for different dispatch keys).
+#     The list of allowed dispatch keys (in addition to 'Default' which
+#     represents the Autograd alias key) is torchgen/model.py:AUTOGRAD_KEYS.
+#   - One or more gradients entries, mapping differentiable input
+#     names to a formula specifying how to compute its gradient.
+#     Note that a single gradient entry can specify the gradient
+#     formula for multiple input names, by specifying a key
+#     "input1, input2" (see atan2 for an example).
+#   - An argument can be flagged as 'non_differentiable'.
+#   - Optional entry with key 'output_differentiability' and value a list of the
+#     same length as the number of outputs from the forward function. The list
+#     should contain only booleans, specifying whether each of the output Tensor
+#     is differentiable.
+#     If it is not specified for a function that returns multiple elements but
+#     uses `grad` instead of `grads[idx]`, then all but the first output will
+#     be marked as non-differentiable.
+#     If None of the output is differentiable, you can also add the function
+#     name to `gen_variable_type.py`'s `DONT_REQUIRE_DERIVATIVE` list.
+#
+# There are two cases for Tensor and TensorList arguments here:
+#   - If that argument is differentiable, in the sense that a gradient with respect
+#     to that argument could exist. You should either:
+#       - Specify the formula for that gradient
+#       - Specify not_implemented("function_name") as a formula to say that this is not
+#         implemented yet (but might be in the future and the user can request that on an issue)
+#   - If that argument is not differentiable, because it is not a floating point dtype or the
+#     function is not differentiable with respect to that argument  for
+#     example. You should either:
+#       - Do not specify any formula for this argument
+#       - Specify explicitly that this argument is "non_differentiable". Note that in this case,
+#         we trust you that this argument will never have requires_grad=True and it will be silently
+#         ignored if it does.
+#
+# If a function has out-of-place and in-place variants, then the derivative
+# definition for the in-place variant is optional. It will default to the
+# definition for the out-of-place variant. Note that _out variants are never
+# differentiable.
+#
+# Gradient expressions are standard C++ expressions operating on ATen
+# variables.  In a gradient expression, the following variables/functions
+# are in scope:
+#
+#   - 'grad', the gradient of the output (often spelled grad_output
+#     in Python) which we are going to left-multiply.
+#
+#     When a function returns multiple *differentiable* outputs,
+#     you can refer to the gradients of each outputs using 'grads',
+#     e.g., 'grads[0]', 'grads[1]'.
+#
+#     When a function returns multiple *differentiable* outputs that
+#     are named, you can refer to the gradients of each outputs using
+#     'grad_{name}', e.g., 'grad_x', 'grad_y'.
+#
+#     When a function returns *one* differentiable output (the
+#     first output) and some more nondifferentiable outputs,
+#     you MUST refer to the gradient of the differentiable output with
+#     'grad' (this case is special-cased in our code generation).
+#
+#     Note that the number of differentiable outputs can be modified by the
+#     'output_differentiability' entry (see above).
+#
+#     Across a differentiable function's derivatives set, it is not
+#     permitted to mix the use of "grad", "grads", and
+#     "grad_{name}". You must be consistent for that differentiable
+#     function.
+#
+#   - Any of the input arguments, tensor or non-tensor, including
+#     argument names that only appear in Declarations.yaml, e.g. 'output'.
+#
+#   - 'result', representing the result of evaluating the forward
+#     expression for ATen native function declarations. If the forward
+#     expression outputs a tuple, use 'resultX' instead to access the
+#     X-th entry
+#
+#   - 'grad_input_mask', a std::array<bool, n>, specifies which input
+#     gradients are actually needed.  For example, in the entry
+#     `input0, input1: foo(grad_input_mask)`, `grad_input_mask` is a size
+#     two array, where `grad_input_mask[0]` is true if `input0` requires
+#     grad, and `grad_input_mask[1]` is true if `input1` requires grad.
+#
+#     (NB: if your function computes gradient for a list of tensors,
+#     the `grad_input_mask` will only have a single entry for the list
+#     specifying if either zero or at least one tensor from the list requires
+#     grad.  If we want to support more fine-grained signalling,
+#     we'll need some alternate variable which is not a std::array)
+#
+#   - 'retain_variables', a bool which is true if a user has specified
+#     that saved variables should be retained in case the backwards is
+#     run again later.  This allows an optimization where we can
+#     destroy saved buffers if we know variables are not going to be retained,
+#     e.g., it is used by _cudnn_rnn
+#
+#   - `wrap_opt_if`, is a 2-argument function that accepts a tensor
+#     variable and a boolean condition that dictates whether to save that
+#     variable in a graph. The result of this function is `std::optional<Tensor>`,
+#     and it is `::std::nullopt` when the condition evaluates to `false`,
+#     otherwise it is the variable wrapped in `std::optional<Tensor>`.
+#     For example, wrap_opt_if(var_0, grad_input_mask[1] || grad_input_mask[2])
+#     would mean that `var_0` is saved as long as the second (grad_input_mask[1])
+#     or the third (grad_input_mask[2]) argument requires gradients.
+#     Another interpretation of this expression would read as `var_0` is needed
+#     in the backward computation of the second or the third argument.
+#     NOTE: the usage of `var_i.requires_grad()` in the conditional expression
+#     is not supported, use `grad_input_mask[i]` instead.
+#     NOTE: `wrap_opt_if` could be used to prevent saving redundant variables
+#     with multi-output backward formulas.
+#     See https://github.com/pytorch/pytorch/issues/97575 for more details
+#     on the issue.
+#
+# If you need a complex expression, e.g., with local variables,
+# write a _backward function in torch/csrc/autograd/FunctionsManual.cpp
+# and invoke it from here.  By the way, go read
+# https://github.com/zdevito/ATen/issues/163; this describes an
+# important hazard that occurs when porting backwards from Python to C++
+#
+# Double backwards gradient expressions can be somewhat confusing;
+# the most important thing to remember is: (1) you need to define a
+# derivative formula for every input, including inputs named things
+# like 'grad_output', and (2) the gradient to multiply with is always
+# called 'grad' (even though it really is a grad-grad).
+#
+# You can also add forward derivative definition by defining a formula for
+# a returned value (in general "result" if the name is not specified). This
+# formula works the same way as the backward one and advanced implementations
+# should also be placed in the FunctionsManual file.
+# This formula should compute a single Jacobian vector product using the (primal)
+# value of the argument "foo_p", its forward grad "foo_t" and the result of the
+# function as "result".
+# Note that the forward derivative can be automatically generated in two cases:
+#     - if your function is linear (NOT affine or multi-linear), then you can
+#       specify so by just using the string "auto_linear" for the formula.
+#     - if your function is applied element wise (and has a single input), you
+#       can specify so by just using the string "auto_element_wise" for the formula.
+#
+# Note that to avoid unpacking overhead, functions taking TensorList as inputs
+# will always have their forward grad formula called. This function is responsible
+# to check if any computation is needed and should return an undefined Tensor when
+# there is nothing to do. You can check "cat_forward" for a full example.
+#
+# NB: There are a number of gradient definitions in here which are bogus
+# (implemented using zeros_like).  These gradients are (hopefully) not
+# used by our frontend.  You MUST check the frontend code; search for
+# OpName.apply to see if it's still using a legacy Python style API.
+#
+# Note: Returning views.
+# The following cases exist:
+#     - If a function returns no view, it can have arbitrary outputs.
+#     - If a function return at least one Tensor that is a differentiable view
+#       of one of its input:
+#         - If there is only one differentiable output, this Tensor is marked as a
+#           differentiable view. (alias or transpose for example)
+#         - If there are more than one differentiable output, by default all the views are
+#           marked as differentiable views and created with allow_rebase_history=false.
+#           Meaning that any inplace operation on it will raise an error. (unbind for example)
+#
+#  Notes about undefined output gradients:
+#     All backward functions must support all combinations of undefined output
+#     gradient Tensors, where `grad[i].defined() == false`. Depending on the
+#     number of input and output grads your derivative formula uses, code
+#     generation may automatically add some level of undefined grad support,
+#     according to these three cases:
+#
+#       * 1 input grad and 1 output grad:
+#           Complete undefined grad support is automatically added, so you
+#           shouldn't have to think about it, unless there is a bug in the code
+#           generation.
+#
+#       * 1 input grad and multiple output grads:
+#           Undefined grad support is automatically added ONLY in the case where
+#           all output grads are undefined. You will have to add explicit support
+#           for cases where a subset of output grads is undefined.
+#
+#       * multiple input grads:
+#           No automatic support, so you will need to add it.
+#
+#     If your derivative formula uses more than one output grad, it is usually
+#     preferable to add undefined grad support in the backward function itself
+#     (if you're using one), rather than in the derivative formula in this file.
+#
+#     Undefined Tensors are created with the default constructor `at::Tensor()`.
+#     It is an efficient way to represent a Tensor filled with zeros because
+#     the Tensor holds no sizing information and no Storage data is allocated.
+#     But consequently, Tensor operations cannot be performed on them.
+#     Therefore, your backward function should treat an undefined output grad as
+#     a zero, and it needs to be a special case.
+#
+#     If all output grads are undefined, then it should be correct for the
+#     backward function to return undefined input grads. Since we use the chain
+#     rule, output grads equal to zero should result in input grads equal to zero,
+#     unless there is some rare special case.
+#
+#     If a subset of output grads is undefined, then it may be acceptable for
+#     the backward function to return undefined input grads--it depends on the
+#     specific function, so you'll have to determine that yourself. If returning
+#     an undefined Tensor is correct for a given input grad, it is also logically
+#     correct to return a defined grad full of zeros, but that would not be
+#     preferable since it would be less efficient.
+#
+# NB: The parameter names here MUST be consistent with the parameter names
+# in native_functions.yaml
+- name: abs(Tensor self) -> Tensor
+  self: grad * self.sgn()
+  result: handle_r_to_c(result.scalar_type(), self_t.conj() * self_p.sgn())
+
+- name: acos(Tensor self) -> Tensor
+  self: grad * -((-self * self + 1).rsqrt()).conj()
+  result: auto_element_wise
+
+- name: add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
+  self: handle_r_to_c(self.scalar_type(), grad)
+  other: handle_r_to_c(other.scalar_type(), maybe_multiply(grad, alpha.conj()))
+  result: self_t + maybe_multiply(other_t, alpha)
+
+- name: add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor
+  self: handle_r_to_c(self.scalar_type(), grad)
+  result: self_t.clone()
+
+- name: addbmm(Tensor self, Tensor batch1, Tensor batch2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  self: maybe_multiply(grad, beta.conj())
+  batch1: maybe_multiply(grad.unsqueeze(0).expand_symint({ batch1.sym_size(0), batch1.sym_size(1), batch2.sym_size(2) }).bmm(batch2.transpose(1, 2).conj()), alpha.conj())
+  batch2: maybe_multiply(batch1.transpose(1, 2).conj().bmm(grad.unsqueeze(0).expand_symint({ batch1.sym_size(0), batch1.sym_size(1), batch2.sym_size(2) })), alpha.conj())
+  result: maybe_multiply(self_t, beta) + maybe_multiply(batch1_t.bmm(batch2_p).sum(0), alpha) + maybe_multiply(batch1_p.bmm(batch2_t).sum(0), alpha)
+
+- name: addcdiv(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1) -> Tensor
+  self: handle_r_to_c(self.scalar_type(), grad)
+  tensor1: handle_r_to_c(tensor1.scalar_type(), grad * (value / tensor2).conj())
+  tensor2: handle_r_to_c(tensor2.scalar_type(), -grad * (value * tensor1 / (tensor2 * tensor2)).conj())
+  result: self_t + maybe_multiply(tensor1_t / tensor2_p, value) - maybe_multiply(tensor2_t * (tensor1_p / tensor2_p) / tensor2_p, value)
+
+- name: addcmul(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1) -> Tensor
+  self: handle_r_to_c(self.scalar_type(), grad)
+  tensor1: handle_r_to_c(tensor1.scalar_type(), grad * (tensor2 * value).conj())
+  tensor2: handle_r_to_c(tensor2.scalar_type(), grad * (tensor1 * value).conj())
+  result: self_t + maybe_multiply(tensor1_t * tensor2_p, value) + maybe_multiply(tensor2_t * tensor1_p, value)
+
+- name: addmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  self: maybe_multiply(grad, beta.conj())
+  mat1: mm_mat1_backward(grad, mat2, mat1.sym_sizes(), mat1.sym_strides(), mat1.layout(), alpha)
+  mat2: mm_mat2_backward(grad, mat1, mat2.sym_sizes(), mat2.sym_strides(), mat2.layout(), alpha)
+  result: maybe_multiply(self_t, beta) + maybe_multiply(mat1_t.mm(mat2_p), alpha) + maybe_multiply(mat1_p.mm(mat2_t), alpha)
+
+- name: _sparse_addmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  self: maybe_multiply(grad, beta)
+  mat1: mm_mat1_sparse_backward(grad, mat1, mat2, alpha)
+  mat2: mm_mat2_backward(grad, mat1, mat2.sym_sizes(), mat2.sym_strides(), mat2.layout(), alpha)
+
+- name: addmv(Tensor self, Tensor mat, Tensor vec, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  self: maybe_multiply(grad, beta.conj())
+  mat: maybe_multiply(grad.ger(vec.conj()), alpha.conj())
+  vec: maybe_multiply(mat.t().conj().mv(grad), alpha.conj())
+  result: maybe_multiply(self_t, beta) + maybe_multiply(mat_t.mv(vec_p), alpha) + maybe_multiply(mat_p.mv(vec_t), alpha)
+
+- name: addr(Tensor self, Tensor vec1, Tensor vec2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  self: maybe_multiply(grad, beta.conj())
+  vec1: maybe_multiply(grad.mv(vec2.conj()), alpha.conj())
+  vec2: maybe_multiply(grad.t().mv(vec1.conj()), alpha.conj())
+  result: maybe_multiply(self_t, beta) + maybe_multiply(vec1_t.outer(vec2_p), alpha) + maybe_multiply(vec1_p.outer(vec2_t), alpha)
+
+- name: affine_grid_generator(Tensor theta, SymInt[] size, bool align_corners) -> Tensor
+  theta: affine_grid_generator_backward_symint(grad, size, align_corners)
+  result: auto_linear
+
+- name: alias(Tensor(a) self) -> Tensor(a)
+  self: grad
+  result: self_t
+
+- name: angle(Tensor self) -> Tensor
+  self: angle_backward(grad, self)
+  result: handle_r_to_c(result.scalar_type(), angle_backward(self_t.conj(), self_p).conj())
+
+# The four items below are necessary because TensorIterator doesn't work on
+# Variables (codegen does not unwrap the input Tensor for all() and any() ).
+- name: any(Tensor self) -> Tensor
+  output_differentiability: [False]
+
+- name: any.dim(Tensor self, int dim, bool keepdim=False) -> Tensor
+  output_differentiability: [False]
+
+- name: any.dims(Tensor self, int[]? dim=None, bool keepdim=False) -> Tensor
+  output_differentiability: [False]
+
+- name: _is_all_true(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: _is_any_true(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: all(Tensor self) -> Tensor
+  output_differentiability: [False]
+
+- name: all.dim(Tensor self, int dim, bool keepdim=False) -> Tensor
+  output_differentiability: [False]
+
+- name: all.dims(Tensor self, int[]? dim=None, bool keepdim=False) -> Tensor
+  output_differentiability: [False]
+
+- name: acosh(Tensor self) -> Tensor
+# Save one rsqrt in the real case by using that for x real and positive sqrt(x*y) = sqrt(x)*sqrt(y) (not true in the complex case)
+  self: "self.is_complex() ? grad * ((self + 1).rsqrt() * (self - 1).rsqrt()).conj() : grad * (self * self - 1).rsqrt()"
+  result: auto_element_wise
+
+- name: acosh_(Tensor(a!) self) -> Tensor(a!)
+  self: not_implemented("inplace version of acosh")
+
+- name: asinh(Tensor self) -> Tensor
+  self: grad * (self.pow(2) + 1).rsqrt().conj()
+  result: auto_element_wise
+
+- name: asinh_(Tensor(a!) self) -> Tensor(a!)
+  self: not_implemented("inplace version of asinh")
+
+- name: atanh(Tensor self) -> Tensor
+  self: grad * 1 / (1 - self.pow(2)).conj()
+  result: auto_element_wise
+
+- name: atanh_(Tensor(a!) self) -> Tensor(a!)
+  self: not_implemented("inplace version of atanh")
+
+- name: as_strided(Tensor(a) self, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor(a)
+  self: as_strided_backward(grad, TensorGeometry(self), size, stride, storage_offset)
+  result: auto_linear
+
+- name: as_strided_(Tensor(a!) self, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor(a!)
+  self: as_strided_backward(grad, TensorGeometry(self), size, stride, storage_offset)
+  result: auto_linear
+
+- name: asin(Tensor self) -> Tensor
+  self: grad * (-self * self + 1).rsqrt().conj()
+  result: auto_element_wise
+
+- name: atan(Tensor self) -> Tensor
+  self: grad / (self * self + 1).conj()
+  result: auto_element_wise
+
+- name: atan2(Tensor self, Tensor other) -> Tensor
+  self, other: atan2_backward(grad, self, other, grad_input_mask)
+  result: (-self_p * other_t + other_p * self_t) / (self_p.pow(2) + other_p.pow(2))
+
+- name: baddbmm(Tensor self, Tensor batch1, Tensor batch2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  self: maybe_multiply(grad, beta.conj())
+  batch1: maybe_multiply(grad.bmm(batch2.transpose(1, 2).conj()), alpha.conj())
+  batch2: maybe_multiply(batch1.transpose(1, 2).conj().bmm(grad), alpha.conj())
+  result: maybe_multiply(self_t, beta) + maybe_multiply(batch1_t.bmm(batch2_p), alpha) + maybe_multiply(batch1_p.bmm(batch2_t), alpha)
+
+- name: bernoulli(Tensor self, *, Generator? generator=None) -> Tensor
+  self: zeros_like(grad)
+  result: auto_element_wise
+
+- name: bernoulli_.Tensor(Tensor(a!) self, Tensor p, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  p: zeros_like(p)
+  result: self_t.zero_()
+
+- name: bernoulli_.float(Tensor(a!) self, float p=0.5, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.zero_()
+
+- name: bmm(Tensor self, Tensor mat2) -> Tensor
+  self: grad.bmm(mat2.transpose(1, 2).conj())
+  mat2: self.transpose(1, 2).conj().bmm(grad)
+  result: self_t.bmm(mat2_p) + self_p.bmm(mat2_t)
+
+- name: matmul(Tensor self, Tensor other) -> Tensor
+  self, other: matmul_backward(grad, self, other, grad_input_mask)
+
+- name: cat(Tensor[] tensors, int dim=0) -> Tensor
+  tensors: cat_tensors_backward(grad, to_args_sizes_symint(tensors), to_args_scalartypes(tensors), dim)
+  result: cat_jvp(tensors, dim)
+
+- name: cauchy_(Tensor(a!) self, float median=0, float sigma=1, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.zero_()
+
+- name: ceil(Tensor self) -> Tensor
+  self: zeros_like(grad)
+  result: auto_element_wise
+
+- name: cholesky(Tensor self, bool upper=False) -> Tensor
+  self: cholesky_backward(grad, upper, result)
+
+- name: chunk(Tensor(a -> *) self, int chunks, int dim=0) -> Tensor(a)[]
+  dispatch:
+    Default:
+      # the default case will use the CompositeImplicitAutograd
+      self: not_implemented("chunk")
+    AutogradNestedTensor:
+      self: chunk_backward_nested(grads, self, chunks, dim)
+
+- name: linalg_cholesky_ex(Tensor self, *, bool upper=False, bool check_errors=False) -> (Tensor L, Tensor info)
+  self: cholesky_backward(grad, upper, L)
+  L: cholesky_jvp(self_t, L, upper)
+
+- name: cholesky_solve(Tensor self, Tensor input2, bool upper=False) -> Tensor
+  self, input2: cholesky_solve_backward(grad, self, input2, result, upper, grad_input_mask)
+  result: cholesky_solve_jvp(result, input2_p, input2_t, self_t, upper)
+
+- name: cholesky_inverse(Tensor self, bool upper=False) -> Tensor
+  self: cholesky_inverse_backward(grad, self, upper, result)
+  result: cholesky_inverse_jvp(self_p, self_t, result, upper)
+
+# For clamp, gradient is not defined at the boundaries. But empirically it's helpful
+# to be able to get gradient on min and max, so we return the subgradient 1 for these cases.
+- name: clamp.Tensor(Tensor self, Tensor? min=None, Tensor? max=None) -> Tensor
+  self: clamp_backward(grad, self, min, max)
+  min, max: clamp_backward_min_max(grad, self, min, max, grad_input_mask)
+  result: clamp_jvp(self_p, self_t, min_p, min_t, max_p, max_t)
+
+- name: clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> Tensor
+  self: clamp_backward(grad, self, min, max)
+  result: auto_element_wise
+
+- name: clamp_min(Tensor self, Scalar min) -> Tensor
+  self: where(self >= min, grad, at::scalar_tensor(0., grad.options()))
+  result: auto_element_wise
+
+- name: clamp_min.Tensor(Tensor self, Tensor min) -> Tensor
+  self: where(self >= min, grad, at::scalar_tensor(0., grad.options()))
+  min: where(self < min, grad, at::scalar_tensor(0., grad.options()))
+  result: where(self_p >= min_p, self_t, min_t)
+
+- name: clamp_max(Tensor self, Scalar max) -> Tensor
+  self: where(self <= max, grad, at::scalar_tensor(0., grad.options()))
+  result: auto_element_wise
+
+- name: clamp_max.Tensor(Tensor self, Tensor max) -> Tensor
+  self: where(self <= max, grad, at::scalar_tensor(0., grad.options()))
+  max: where(self > max, grad, at::scalar_tensor(0., grad.options()))
+  result: where(self_p <= max_p, self_t, max_t)
+
+- name: clone(Tensor self, *, MemoryFormat? memory_format=None) -> Tensor
+  self: grad
+  result: auto_linear
+
+- name: _lazy_clone(Tensor self) -> Tensor
+  self: grad
+  result: auto_linear
+
+- name: _to_copy(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool non_blocking=False, MemoryFormat? memory_format=None) -> Tensor
+  self: _to_copy_backward(grad, self.options())
+  result: _to_copy(self_t, dtype, layout, device, pin_memory, non_blocking, memory_format)
+  # The condition is: if dtype is not nullopt, then isDifferentiableType(*dtype)
+  # (If dtype IS nullopt, we rely on the regular check that any input requires grad).
+  output_differentiability: ["!dtype || isDifferentiableType(*dtype)"]
+
+- name: _coalesce(Tensor self) -> Tensor
+  self: grad
+
+- name: complex(Tensor real, Tensor imag) -> Tensor
+  real: at::real(grad)
+  imag: at::imag(grad)
+  result: at::complex(real_t, imag_t)
+
+- name: polar(Tensor abs, Tensor angle) -> Tensor
+  abs, angle: polar_backward(grad, result)
+  result: at::complex(abs_t*angle_p.cos() - angle_t*abs_p*angle_p.sin(), abs_t*angle_p.sin() + angle_t*abs_p*angle_p.cos())
+
+- name: _conj(Tensor(a) self) -> Tensor(a)
+  self: grad.conj()
+  result: self_t.conj()
+
+- name: _neg_view(Tensor(a) self) -> Tensor(a)
+  self: grad.neg()
+  result: self_t._neg_view()
+
+- name: _conj_physical(Tensor self) -> Tensor
+  self: grad.conj_physical()
+  result: self_t.conj_physical()
+
+- name: conj_physical_(Tensor(a!) self) -> Tensor(a!)
+  self: grad.conj_physical()
+  result: self_t.conj_physical_()
+
+- name: copysign.Tensor(Tensor self, Tensor other) -> Tensor
+  self: copysign_tensor_self_backward(grad, self, result)
+  other: zeros_like(other)
+  result: copysign_tensor_self_backward(self_t, self_p, result)
+
+- name: copysign.Scalar(Tensor self, Scalar other) -> Tensor
+  self: copysign_tensor_self_backward(grad, self, result)
+  result: auto_element_wise
+
+- name: cos(Tensor self) -> Tensor
+  self: grad * -self.sin().conj()
+  result: auto_element_wise
+
+- name: cosh(Tensor self) -> Tensor
+  self: grad * self.sinh().conj()
+  result: auto_element_wise
+
+- name: count_nonzero.dim_IntList(Tensor self, int[] dim) -> Tensor
+  output_differentiability: [False]
+
+- name: count_nonzero(Tensor self, int? dim=None) -> Tensor
+  output_differentiability: [False]
+
+- name: linalg_cross(Tensor self, Tensor other, *, int dim=-1) -> Tensor
+  self: at::linalg_cross(other.conj(), grad, dim)
+  other: at::linalg_cross(grad, self.conj(), dim)
+  result: "at::linalg_cross(self_t, other_p, dim) + at::linalg_cross(self_p, other_t, dim)"
+
+- name: logcumsumexp(Tensor self, int dim) -> Tensor
+  self: logcumsumexp_backward(grad, self, result, dim)
+  result: logcumsumexp_jvp(self_p, self_t, dim)
+
+- name: cumprod(Tensor self, int dim, *, ScalarType? dtype=None) -> Tensor
+  self: cumprod_backward(grad.to(self.scalar_type()), self, dim, result)
+  result: "cumprod_jvp(self_t, self_p, result, dim).to(dtype.has_value() ? *dtype : self_p.scalar_type())"
+
+- name: cumsum(Tensor self, int dim, *, ScalarType? dtype=None) -> Tensor
+  self: cumsum_backward(grad.to(self.scalar_type()), dim)
+  result: auto_linear
+
+- name: cummax(Tensor self, int dim) -> (Tensor values, Tensor indices)
+  self: cummaxmin_backward(grad, self, indices, dim)
+  values: self_t.gather(dim, indices)
+
+- name: cummin(Tensor self, int dim) -> (Tensor values, Tensor indices)
+  self: cummaxmin_backward(grad, self, indices, dim)
+  values: self_t.gather(dim, indices)
+
+- name: conv_tbc(Tensor self, Tensor weight, Tensor bias, int pad=0) -> Tensor
+  self, weight, bias: "grad.defined() ? conv_tbc_backward(grad, self, weight, bias, pad) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: _ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank=0, bool zero_infinity=False) -> (Tensor, Tensor)
+  log_probs: _ctc_loss_backward(grad, log_probs, targets, input_lengths, target_lengths, result0, result1, blank, zero_infinity)
+
+- name: _ctc_loss.Tensor(Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, int blank=0, bool zero_infinity=False) -> (Tensor, Tensor)
+  log_probs: _ctc_loss_backward(grad, log_probs, targets, input_lengths, target_lengths, result0, result1, blank, zero_infinity)
+
+- name: deg2rad(Tensor self) -> Tensor
+  self: deg2rad_backward(grad)
+  result: auto_element_wise
+
+- name: _linalg_det(Tensor A) -> (Tensor result, Tensor LU, Tensor pivots)
+  A: linalg_det_backward(grad, result, A, LU, pivots)
+  result: linalg_det_jvp(A_t, result, LU, pivots, A_p.is_contiguous() && !A_p.is_complex())
+  output_differentiability: [True, False, False]
+
+- name: _linalg_slogdet(Tensor A) -> (Tensor sign, Tensor logabsdet, Tensor LU, Tensor pivots)
+  A: slogdet_backward(grad_sign, grad_logabsdet, A, sign, LU, pivots)
+  sign, logabsdet: slogdet_jvp(LU, pivots, A_t, sign, A_p.is_contiguous() && !A_p.is_complex())
+  output_differentiability: [True, True, False, False]
+
+- name: block_diag(Tensor[] tensors) -> Tensor
+  tensors: block_diag_backward(grad, to_args_sizes(tensors), to_args_scalartypes(tensors))
+  result: block_diag_jvp(tensors)
+
+- name: diag_embed(Tensor self, int offset=0, int dim1=-2, int dim2=-1) -> Tensor
+  self: grad.diagonal(offset, dim1, dim2)
+  result: auto_linear
+
+- name: diagonal(Tensor(a) self, int offset=0, int dim1=0, int dim2=1) -> Tensor(a)
+  self: diagonal_backward_symint(grad, self.sym_sizes(), offset, dim1, dim2)
+  result: auto_linear
+
+- name: diagonal_backward(Tensor grad_output, SymInt[] input_sizes, int offset, int dim1, int dim2) -> Tensor
+  grad_output: grad.diagonal(offset, dim1, dim2)
+  result: auto_linear
+
+- name: dist(Tensor self, Tensor other, Scalar p=2) -> Tensor
+  self: norm_backward(grad, self - other, p, result)
+  other: -norm_backward(grad, self - other, p, result)
+  result: norm_jvp(self_p - other_p, self_t - other_t, p, result, {}, false)
+
+# The backward formula is done in this order to improve numerical stability
+# of the higher order derivatives, see https://github.com/pytorch/pytorch/issues/43414
+# Note that we don't use "result" because saving it would be BC-breaking when it is used in an inplace operation later
+- name: div.Tensor(Tensor self, Tensor other) -> Tensor
+  self: div_tensor_self_backward(grad, other, self.scalar_type())
+  other: div_tensor_other_backward(grad, self, other)
+  result: (self_t - other_t * result) / other_p
+
+- name: div.Scalar(Tensor self, Scalar other) -> Tensor
+  self: div_tensor_self_backward(grad, other, self.scalar_type())
+  result: self_t / other
+
+- name: div.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> Tensor
+  self: div_tensor_self_backward(grad, other, self.scalar_type(), rounding_mode)
+  other: div_tensor_other_backward(grad, self, other, rounding_mode)
+  result: "rounding_mode.has_value() ? result.new_zeros_symint(result.sym_sizes()) : self_t / other_p - other_t * (self_p / other_p) / other_p"
+
+- name: div.Scalar_mode(Tensor self, Scalar other, *, str? rounding_mode) -> Tensor
+  self: div_tensor_self_backward(grad, other, self.scalar_type(), rounding_mode)
+  result: "rounding_mode.has_value() ? result.new_zeros_symint(result.sym_sizes()) : self_t / other"
+
+- name: dot(Tensor self, Tensor tensor) -> Tensor
+  self: grad * tensor.conj()
+  tensor: grad * self.conj()
+  result: at::dot(self_t, tensor_p) + at::dot(self_p, tensor_t)
+
+- name: vdot(Tensor self, Tensor other) -> Tensor
+  self: grad.conj() * other
+  other: grad * self
+  result: at::vdot(self_t, other_p) + at::vdot(self_p, other_t)
+
+- name: _fused_dropout(Tensor self, float p, Generator? generator=None) -> (Tensor, Tensor)
+  self: _fused_dropout_backward(grad, result1, p)
+
+- name: native_dropout(Tensor input, float p, bool? train) -> (Tensor, Tensor)
+  input: "GradMode::is_enabled() ? infinitely_differentiable_native_dropout_backward(grad, result1, (!train.has_value() || !train.value() ? 1 : (p == 1 ? 0.0 : 1.0 / (1.0 - p)))) : native_dropout_backward(grad, result1, (!train.has_value() || !train.value() ? 1 : (p == 1 ? 0.0 : 1.0 / (1.0 - p))))"
+  result0: "(!train.has_value() || train.value()) ? (p == 1 ? 0.0 : 1.0 / (1.0 - p)) * input_t * result1 : input_t"
+
+- name: native_dropout_backward(Tensor grad_output, Tensor mask, float scale) -> Tensor
+  grad_output: "native_dropout_double_backward(grad, grad_output, mask, scale)"
+  mask: 'not_implemented("native_dropout_backward: mask")'
+
+- name: eq_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  self: zeros_like(self)
+  result: self_t.zero_()
+
+- name: eq_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  self: zeros_like(self)
+  other: zeros_like(other)
+  result: self_t.zero_()
+
+- name: erf(Tensor self) -> Tensor
+  self: 2.0 / sqrt(M_PI) * exp(-(self.pow(2))) * grad
+  result: auto_element_wise
+
+- name: erfc(Tensor self) -> Tensor
+  self: -2.0 / sqrt(M_PI) * exp(-(self.pow(2))) * grad
+  result: auto_element_wise
+
+- name: special_erfcx(Tensor self) -> Tensor
+  self: (2.0 * self * result - 2.0 / sqrt(M_PI)) * grad
+  result: auto_element_wise
+
+- name: erfinv(Tensor self) -> Tensor
+  self: 0.5 * sqrt(M_PI) * exp(self.erfinv().pow(2)) * grad
+  result: auto_element_wise
+
+- name: exp(Tensor self) -> Tensor
+  self: grad * result.conj()
+  result: auto_element_wise
+
+- name: exp2(Tensor self) -> Tensor
+  self: grad * result.conj() * M_LN2
+  result: auto_element_wise
+
+- name: expm1(Tensor self) -> Tensor
+  self: grad * (result.conj() + 1)
+  result: auto_element_wise
+
+# TODO: this derivative is not SymInt safe, need sum_to support
+- name: expand(Tensor(a) self, SymInt[] size, *, bool implicit=False) -> Tensor(a)
+  self: at::sum_to(grad, self.sym_sizes())
+  result: auto_linear
+
+- name: exponential_(Tensor(a!) self, float lambd=1, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.zero_()
+
+- name: fake_quantize_per_tensor_affine_cachemask(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor output, Tensor mask)
+  self: fake_quantize_per_tensor_affine_cachemask_backward(grad, mask)
+
+- name: _fake_quantize_per_tensor_affine_cachemask_tensor_qparams(Tensor self, Tensor scale, Tensor zero_point, Tensor fake_quant_enabled, int quant_min, int quant_max) -> (Tensor output, Tensor mask)
+  self: fake_quantize_per_tensor_affine_cachemask_backward(grad, mask)
+
+- name: _fake_quantize_learnable_per_tensor_affine(Tensor self, Tensor scale, Tensor zero_point, int quant_min, int quant_max, float grad_factor=1.0) -> Tensor
+  self, scale, zero_point: "grad.defined() ? _fake_quantize_learnable_per_tensor_affine_backward(grad, self, scale, zero_point, quant_min, quant_max, grad_factor) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: fake_quantize_per_channel_affine_cachemask(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor output, Tensor mask)
+  self: fake_quantize_per_channel_affine_cachemask_backward(grad, mask)
+
+- name: _fake_quantize_learnable_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max, float grad_factor=1.0) -> Tensor
+  self, scale, zero_point: "grad.defined() ? _fake_quantize_learnable_per_channel_affine_backward(grad, self, scale, zero_point, axis, quant_min, quant_max, grad_factor) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: _fused_moving_avg_obs_fq_helper(Tensor self, Tensor observer_on, Tensor fake_quant_on, Tensor(a!) running_min, Tensor(b!) running_max, Tensor(c!) scale, Tensor(d!) zero_point, float averaging_const, int quant_min, int quant_max, int ch_axis, bool per_row_fake_quant=False, bool symmetric_quant=False) -> (Tensor output, Tensor mask)
+  self: fake_quantize_per_tensor_affine_cachemask_backward(grad, mask)
+
+- name: fill.Scalar(Tensor self, Scalar value) -> Tensor
+  self: zeros_like(grad)
+  result: at::fill(self_t, 0)
+
+- name: fill.Tensor(Tensor self, Tensor value) -> Tensor
+  self: zeros_like(grad)
+  value: grad.sum()
+  result: at::fill(self_t, value_t)
+
+- name: fill_.Scalar(Tensor(a!) self, Scalar value) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.fill_(0)
+
+- name: fill_.Tensor(Tensor(a!) self, Tensor value) -> Tensor(a!)
+  self: zeros_like(grad)
+  value: grad.sum()
+  result: self_t.fill_(value_t)
+
+- name: floor(Tensor self) -> Tensor
+  self: zeros_like(grad)
+  result: auto_element_wise
+
+- name: fmod.Scalar(Tensor self, Scalar other) -> Tensor
+  self: grad
+  result: auto_element_wise
+
+- name: fmod.Tensor(Tensor self, Tensor other) -> Tensor
+  self: grad
+  other: -grad * self.div(other, /*rounding_mode=*/"trunc")
+  result: self_t - other_t * self_p.div(other_p, /*rounding_mode=*/"trunc")
+
+- name: frac(Tensor self) -> Tensor
+  self: grad
+  result: self_t
+
+- name: frexp.Tensor(Tensor self) -> (Tensor mantissa, Tensor exponent)
+  self: grad / exponent.exp2()
+  mantissa: self_t / exponent.exp2()
+
+- name: gather(Tensor self, int dim, Tensor index, *, bool sparse_grad=False) -> Tensor
+  self: gather_backward(grad, self, dim, index, sparse_grad)
+  index: non_differentiable
+  result: auto_linear
+
+- name: ge_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  self: zeros_like(self)
+  result: self_t.zero_()
+
+- name: ge_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  self: zeros_like(self)
+  other: zeros_like(other)
+  result: self_t.zero_()
+
+- name: geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.zero_()
+
+- name: geqrf(Tensor self) -> (Tensor a, Tensor tau)
+  self: not_implemented("geqrf")
+
+- name: indices(Tensor(a) self) -> Tensor(a)
+  output_differentiability: [False]
+
+- name: _indices(Tensor(a) self) -> Tensor(a)
+  output_differentiability: [False]
+
+- name: crow_indices(Tensor(a) self) -> Tensor(a)
+  output_differentiability: [False]
+
+- name: col_indices(Tensor(a) self) -> Tensor(a)
+  output_differentiability: [False]
+
+- name: ccol_indices(Tensor(a) self) -> Tensor(a)
+  output_differentiability: [False]
+
+- name: row_indices(Tensor(a) self) -> Tensor(a)
+  output_differentiability: [False]
+
+- name: grid_sampler_2d(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> Tensor
+  input, grid: "grad.defined() ? grid_sampler_2d_backward(grad, input, grid, interpolation_mode, padding_mode, align_corners, grad_input_mask) : std::tuple<Tensor, Tensor>()"
+
+- name: grid_sampler_3d(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> Tensor
+  input, grid: "grad.defined() ? grid_sampler_3d_backward(grad, input, grid, interpolation_mode, padding_mode, align_corners, grad_input_mask) : std::tuple<Tensor, Tensor>()"
+
+# See NOTE [ grid_sample CPU fallback ]
+- name: _grid_sampler_2d_cpu_fallback(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> Tensor
+  input, grid: "grad.defined() ? _grid_sampler_2d_cpu_fallback_backward(grad, input, grid, interpolation_mode, padding_mode, align_corners) : std::tuple<Tensor, Tensor>()"
+
+- name: gt_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  self: zeros_like(self)
+  result: self_t.zero_()
+
+- name: gt_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  self: zeros_like(self)
+  other: zeros_like(other)
+  result: self_t.zero_()
+
+- name: hardsigmoid(Tensor self) -> Tensor
+  self: hardsigmoid_backward(grad, self)
+  result: auto_element_wise
+
+- name: histc(Tensor self, int bins=100, Scalar min=0, Scalar max=0) -> Tensor
+  output_differentiability: [False]
+
+- name: hardswish(Tensor self) -> Tensor
+  self: hardswish_backward(grad, self)
+  result: auto_element_wise
+
+- name: hardswish_backward(Tensor grad_output, Tensor self) -> Tensor
+  grad_output: hardswish_backward(grad, self)
+  self: at::where(at::logical_and(-3.0 < self, self < 3.0), grad * grad_output / 3.0, at::zeros({}, self.options()))
+  result: "hardswish_backward(grad_output_t, self_p)
+         + at::where(at::logical_and(-3.0 < self_p, self_p < 3.0), self_t * grad_output_p / 3.0, at::zeros({}, self_p.options()))"
+
+- name: hypot(Tensor self, Tensor other) -> Tensor
+  self: grad * self / result
+  other: grad * other / result
+  result: self_t * self_p / result + other_t * other_p / result
+
+- name: i0(Tensor self) -> Tensor
+  self: grad * at::special_i1(self)
+  result: auto_element_wise
+
+- name: special_i0e(Tensor self) -> Tensor
+  self: grad * (at::special_i1e(self) - self.sgn() * result)
+  result: auto_element_wise
+
+- name: special_i1(Tensor self) -> Tensor
+  self: i1_backward(grad, self, result)
+  result: auto_element_wise
+
+- name: special_i1e(Tensor self) -> Tensor
+  self: i1e_backward(grad, self, result)
+  result: auto_element_wise
+
+- name: igamma(Tensor self, Tensor other) -> Tensor
+  self: 'not_implemented("igamma: input")'
+  other: grad * exp((self - 1) * log(other) - other - lgamma(self))
+
+- name: igammac(Tensor self, Tensor other) -> Tensor
+  self: 'not_implemented("igammac: input")'
+  other: -grad * exp((self - 1) * log(other) - other - lgamma(self))
+
+- name: index.Tensor(Tensor self, Tensor?[] indices) -> Tensor
+  self: index_backward(grad.new_zeros_symint(self.sym_sizes(), self.options()), indices, grad)
+  result: auto_linear
+
+- name: _unsafe_index.Tensor(Tensor self, Tensor?[] indices) -> Tensor
+  self: at::_unsafe_index_put(grad.new_zeros_symint(self.sym_sizes(), self.options()), indices, grad, true)
+  result: auto_linear
+
+- name: _unsafe_masked_index(Tensor self, Tensor mask, Tensor?[] indices, Scalar fill) -> Tensor
+  self: at::_unsafe_masked_index_put_accumulate(grad.new_zeros_symint(self.sym_sizes(), self.options()), mask, indices, grad)
+  mask: non_differentiable
+  result: _unsafe_masked_index(self_t, mask, indices, 0)
+
+- name: _unsafe_masked_index_put_accumulate(Tensor self, Tensor mask, Tensor?[] indices, Tensor values) -> Tensor
+  self: grad
+  mask: non_differentiable
+  values: at::_unsafe_masked_index(grad, mask, indices, 0)
+  result: at::_unsafe_masked_index_put_accumulate(self_t, mask, indices, values_t)
+
+- name: index_add(Tensor self, int dim, Tensor index, Tensor source, *, Scalar alpha=1) -> Tensor
+  self: grad
+  # The case source.dim() == 0  is necessary to support scalar tensors of the form
+  # source.dim() == 0 and index.dim() == 1 and index.size() == (1,),
+  # This is because source is not broadcastable to index, as source.dim() < index.dim()
+  source: "maybe_multiply(source.dim() > 0 ? grad.index_select(dim, index).expand_as(source) : grad.index_select(dim, index.squeeze(0)), alpha)"
+  index: non_differentiable
+  result: at::index_add(self_t, dim, index, maybe_multiply(source_t, alpha))
+
+- name: index_reduce(Tensor self, int dim, Tensor index, Tensor source, str reduce, *, bool include_self=True) -> Tensor
+  self, source: index_reduce_backward(grad, self, dim, index, source, reduce, include_self, result)
+  index: non_differentiable
+
+- name: index_copy(Tensor self, int dim, Tensor index, Tensor source) -> Tensor
+  self: grad.index_fill(dim, index, 0)
+  # The case source.dim() == 0 is necessary to support scalar tensors of the form
+  # source.dim() == 0 and index.dim() == 1 and index.size() == (1,),
+  # This is because source is not broadcastable to index, as source.dim() < index.dim()
+  source: "source.dim() > 0 ? grad.index_select(dim, index).expand_as(source) : grad.index_select(dim, index.squeeze(0))"
+  index: non_differentiable
+  result: self_t.index_copy(dim, index, source_t)
+
+- name: index_fill.int_Scalar(Tensor self, int dim, Tensor index, Scalar value) -> Tensor
+  self: grad.index_fill(dim, index, 0)
+  index: non_differentiable
+  result: self_t.index_fill(dim, index, 0)
+
+- name: index_fill.int_Tensor(Tensor self, int dim, Tensor index, Tensor value) -> Tensor
+  self: grad.index_fill(dim, index, 0)
+  value: grad.index_select(dim, std::get<0>(at::_unique(index, /*sorted=*/false))).sum()
+  index: non_differentiable
+  result: self_t.index_fill(dim, index, value_t)
+
+- name: index_put(Tensor self, Tensor?[] indices, Tensor values, bool accumulate=False) -> Tensor
+  self: "accumulate ? grad : grad.index_put(indices, zeros_like(values), false)"
+  values: grad.index(indices)
+  result: self_t.index_put(indices, values_t, accumulate)
+
+- name: _unsafe_index_put(Tensor self, Tensor?[] indices, Tensor values, bool accumulate=False) -> Tensor
+  self: "accumulate ? grad : at::_unsafe_index_put(grad, indices, zeros_like(values), false)"
+  values: at::_unsafe_index(grad, indices)
+  result: at::_unsafe_index_put(self_t, indices, values_t, accumulate)
+
+- name: _index_put_impl_(Tensor(a!) self, Tensor?[] indices, Tensor values, bool accumulate=False, bool unsafe=False) -> Tensor(a!)
+  self: "accumulate ? grad : grad.index_put(indices, zeros_like(values), false)"
+  values: grad.index(indices)
+  result: at::_index_put_impl_(self_t, indices, values_t, accumulate, unsafe)
+
+- name: index_select(Tensor self, int dim, Tensor index) -> Tensor
+  self: index_select_backward_symint(grad, self.sym_sizes(), dim, index)
+  index: non_differentiable
+  result: auto_linear
+
+- name: linalg_inv_ex(Tensor A, *, bool check_errors=False) -> (Tensor inverse, Tensor info)
+  A: -at::matmul(inverse.mH(), at::matmul(grad, inverse.mH()))
+  inverse: -at::matmul(at::matmul(inverse, A_t), inverse)
+  output_differentiability: [True, False]
+
+- name: linalg_pinv.atol_rtol_tensor(Tensor self, *, Tensor? atol=None, Tensor? rtol=None, bool hermitian=False) -> Tensor
+  self: pinv_backward(grad, result, self)
+  result: pinv_jvp(self_p, result, self_t)
+
+- name: isnan(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: kthvalue(Tensor self, SymInt k, int dim=-1, bool keepdim=False) -> (Tensor values, Tensor indices)
+  self: value_selecting_reduction_backward_symint(grad, dim, indices, self.sym_sizes(), keepdim)
+  values: gather_with_keepdimed_indices(self_t, dim, indices, keepdim)
+
+- name: le_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  self: zeros_like(self)
+  result: self_t.zero_()
+
+- name: le_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  self: zeros_like(self)
+  other: zeros_like(other)
+  result: self_t.zero_()
+
+- name: lerp.Scalar(Tensor self, Tensor end, Scalar weight) -> Tensor
+  self: "weight.isComplex() ? grad * (1 - weight.conj().toComplexDouble()) : grad * (1 - weight.toDouble())"
+  end: grad * weight.conj()
+  result: at::lerp(self_t, end_t, weight)
+
+- name: lerp.Tensor(Tensor self, Tensor end, Tensor weight) -> Tensor
+  self: grad * (1 - weight).conj()
+  end: grad * weight.conj()
+  weight: grad * (end - self).conj()
+  result: at::lerp(self_t, end_t, weight_p) + weight_t * (end_p - self_p)
+
+- name: lgamma(Tensor self) -> Tensor
+  self: grad * digamma(self)
+  result: auto_element_wise
+
+- name: digamma(Tensor self) -> Tensor
+  self: grad * polygamma(1, self)
+  result: auto_element_wise
+
+- name: polygamma(int n, Tensor self) -> Tensor
+  self: grad * polygamma(n + 1, self)
+  result: auto_element_wise
+
+- name: polygamma_(Tensor(a!) self, int n) -> Tensor(a!)
+  self: grad * polygamma(n + 1, self)
+  result: self_t.mul_(polygamma(n + 1, original_self_p))
+
+- name: log(Tensor self) -> Tensor
+  self: grad.div(self.conj())
+  result: auto_element_wise
+
+- name: log10(Tensor self) -> Tensor
+  self: grad / (self.conj() * 2.3025850929940456)
+  result: auto_element_wise
+
+- name: log1p(Tensor self) -> Tensor
+  self: log1p_backward(grad, self)
+  result: auto_element_wise
+
+- name: log2(Tensor self) -> Tensor
+  self: grad / (self.conj() * 0.6931471805599453)
+  result: auto_element_wise
+
+- name: logaddexp(Tensor self, Tensor other) -> Tensor
+  self: grad / (1 + exp(other - self)).conj()
+  other: grad / (1 + exp(self - other)).conj()
+  result: self_t / (1 + exp(other_p - self_p)) + other_t / (1 + exp(self_p - other_p))
+
+- name: logaddexp2(Tensor self, Tensor other) -> Tensor
+  self: grad / (1 + pow(2, other - self))
+  other: grad / (1 + pow(2, self - other))
+  result: self_t / (1 + pow(2, other_p - self_p)) + other_t / (1 + pow(2, self_p - other_p))
+
+# Note [Gradient formula for xlogy at x = 0, y <= 0]
+# x * log(y) is not defined at y <= 0, so we cannot even talk about differentiability
+# Now, xlogy(0, y) = 0 by definition.
+# This does not make it differentiable as it's not defined in a neighbourhood of a point
+# (0, y) when y <= 0.
+# Now, when a function is non-differentiable, sometimes we return "a relatively sensible value"
+# In this case, as per the discussion in https://github.com/pytorch/pytorch/issues/80770, we choose
+# this value to be zero, which is the directional derivative along the line {x = 0}.
+- name: xlogy.Tensor(Tensor self, Tensor other) -> Tensor
+  self: at::xlogy(grad, other).masked_fill((self == 0.) & (other <= 0.), 0.)
+  other: grad * self / other
+  result: at::xlogy(self_t, other_p).masked_fill((self_p == 0.) & (other_p <= 0.), 0.) + other_t * self_p / other_p
+
+- name: xlogy.Scalar_Self(Scalar self, Tensor other) -> Tensor
+  other: grad * self / other
+  result: auto_element_wise
+
+- name: xlogy.Scalar_Other(Tensor self, Scalar other) -> Tensor
+  self: "other.toDouble() > 0.
+          ? at::xlogy(grad,  other)
+          : at::xlogy(grad,  other).masked_fill(self == 0., 0.)"
+  result: auto_element_wise
+
+# See Note [Gradient formula for xlogy at x = 0, y <= 0]
+# Same here but with y <= -1
+- name: special_xlog1py(Tensor self, Tensor other) -> Tensor
+  self: at::special_xlog1py(grad,  other).masked_fill((self == 0.) & (other <= -1.), 0.)
+  other: grad * self / (other + 1)
+  result: at::special_xlog1py(self_t,  other_p).masked_fill((self_p == 0.) & (other_p <= -1.), 0.) + other_t * self_p / (other_p + 1)
+
+- name: special_xlog1py.self_scalar(Scalar self, Tensor other) -> Tensor
+  other: grad * self / (other + 1)
+  result: auto_element_wise
+
+- name: special_xlog1py.other_scalar(Tensor self, Scalar other) -> Tensor
+  self: "other.toDouble() > -1.
+          ? at::special_xlog1py(grad,  other)
+          : at::special_xlog1py(grad,  other).masked_fill(self == 0., 0.)"
+  result: auto_element_wise
+
+- name: special_zeta(Tensor self, Tensor other) -> Tensor
+  self: not_implemented("zeta")
+  other:  grad * -self * special_zeta(self + 1., other)
+
+- name: special_zeta.self_scalar(Scalar self, Tensor other) -> Tensor
+  other:  grad * -self * special_zeta(self.toDouble() + 1., other)
+
+- name: special_zeta.other_scalar(Tensor self, Scalar other) -> Tensor
+  self: not_implemented("zeta")
+
+- name: log_normal_(Tensor(a!) self, float mean=1, float std=2, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.zero_()
+
+- name: logsumexp(Tensor self, int[1] dim, bool keepdim=False) -> Tensor
+  self: logsumexp_backward(grad, self, result, dim, keepdim)
+  result: logsumexp_jvp(self_p, self_t, dim, keepdim)
+
+- name: linalg_lstsq(Tensor self, Tensor b, float? rcond=None, *, str? driver=None) -> (Tensor solution, Tensor residuals, Tensor rank, Tensor singular_values)
+  self, b: linalg_lstsq_backward(grads[0], grads[1], self, b, solution, grad_input_mask)
+  solution: linalg_lstsq_solution_jvp(self_p, b_p, self_t, b_t)
+  residuals: linalg_lstsq_residuals_jvp(self_p, b_p, self_t, b_t, solution, residuals)
+  output_differentiability: [True, True, False, False]
+
+- name: lt_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  self: zeros_like(self)
+  result: self_t.zero_()
+
+- name: lt_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  self: zeros_like(self)
+  other: zeros_like(other)
+  result: self_t.zero_()
+
+- name: linalg_lu_factor_ex(Tensor A, *, bool pivot=True, bool check_errors=False) -> (Tensor LU, Tensor pivots, Tensor info)
+  A: lu_factor_ex_backward(grad, LU, pivots, pivot)
+  LU: lu_factor_ex_jvp(A_t, LU, pivots, pivot)
+  output_differentiability: [True, False, False]
+
+- name: linalg_lu(Tensor A, *, bool pivot=True) -> (Tensor P, Tensor L, Tensor U)
+  A: linalg_lu_backward(grad_L, grad_U, P, L, U, pivot)
+  L: std::get<0>(linalg_lu_jvp(A_t, P, L, U, pivot))
+  U: std::get<1>(linalg_lu_jvp(A_t, P, L, U, pivot))
+  output_differentiability: [False, True, True]
+
+- name: linalg_lu_solve(Tensor LU, Tensor pivots, Tensor B, *, bool left=True, bool adjoint=False) -> Tensor
+  LU: linalg_lu_solve_LU(grad, LU, pivots, result, left, adjoint)
+  B: "at::linalg_lu_solve(LU, pivots, grad, left, !adjoint)"
+  result: linalg_lu_solve_jvp(result, LU_p, pivots, LU_t, B_t, left, adjoint)
+
+- name: lu_unpack(Tensor LU_data, Tensor LU_pivots, bool unpack_data=True, bool unpack_pivots=True) -> (Tensor P, Tensor L, Tensor U)
+  LU_data: lu_unpack_backward(grad_L, grad_U, LU_data.sym_size(-2), LU_data.sym_size(-1))
+  LU_pivots: non_differentiable
+  L: "LU_data_t.sym_size(-2) >= LU_data_t.sym_size(-1) ? LU_data_t.tril_symint(-1) : LU_data_t.narrow_symint(-1, 0, LU_data_t.sym_size(-2)).tril_symint(-1)"
+  U: "LU_data_t.sym_size(-1) >= LU_data_t.sym_size(-2) ? LU_data_t.triu_symint() : LU_data_t.narrow_symint(-2, 0, LU_data_t.sym_size(-1)).triu_symint()"
+  output_differentiability: [False, True, True]
+
+- name: masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> Tensor
+  self: grad.masked_fill(mask, 0)
+  mask: non_differentiable
+  result: self_t.masked_fill(mask, 0)
+
+- name: masked_fill.Tensor(Tensor self, Tensor mask, Tensor value) -> Tensor
+  self: grad.masked_fill(mask, 0)
+  value: masked_fill_backward(grad, mask)
+  mask: non_differentiable
+  result: self_t.masked_fill(mask, value_t)
+
+- name: masked_scatter(Tensor self, Tensor mask, Tensor source) -> Tensor
+  self: grad.masked_fill(mask, 0)
+  source: masked_scatter_backward_symint(grad, mask, source.sym_sizes())
+  mask: non_differentiable
+  result: self_t.masked_scatter(mask, source_t)
+
+- name: masked_scatter_backward(Tensor grad_output, Tensor mask, SymInt[] sizes) -> Tensor
+  grad_output: zeros_like(grad_output).masked_scatter(mask, grad)
+  mask: non_differentiable
+  result: masked_scatter_backward(grad_output_t, mask, grad_output_t.sizes())
+
+- name: masked_select(Tensor self, Tensor mask) -> Tensor
+  self: masked_select_backward(grad, self, mask)
+  mask: non_differentiable
+  result: auto_linear
+
+- name: linalg_matrix_exp(Tensor self) -> Tensor
+  self: linalg_matrix_exp_differential(self, grad, /*adjoint*/ true)
+  result: linalg_matrix_exp_differential(self_p, self_t, /*adjoint*/ false)
+
+- name: max.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  self: value_selecting_reduction_backward_symint(grad, dim, indices, self.sym_sizes(), keepdim)
+  values: gather_with_keepdimed_indices(self_t, dim, indices, keepdim)
+
+- name: max(Tensor self) -> Tensor
+  self: evenly_distribute_backward(grad, self, result)
+  result: evenly_read_jvp(self_t, self_p, result)
+
+- name: maximum(Tensor self, Tensor other) -> Tensor
+  self: at::where(self == other, grad / 2, grad).masked_fill_(self < other, 0)
+  other: at::where(self == other, grad / 2, grad).masked_fill_(self > other, 0)
+  result: other_t + at::where(self_p == other_p, at::scalar_tensor(0.5, result.options()), (self_p > other_p).to(result.scalar_type())) * (self_t - other_t)
+
+- name: fmax(Tensor self, Tensor other) -> Tensor
+  self: grad.masked_fill((self >= other).logical_or_(other.isnan()).logical_not_(), 0)
+  other: grad.masked_fill((self >= other).logical_or_(other.isnan()), 0)
+  result: other_t + (self_p > other_p).logical_or_(other_p.isnan()) * (self_t - other_t)
+
+- name: mean(Tensor self, *, ScalarType? dtype=None) -> Tensor
+  dispatch:
+    Default:
+      self: grad.expand_symint(self.sym_sizes()) / self.sym_numel()
+      result: auto_linear
+    AutogradNestedTensor:
+      # TODO: replace this with grad.expand_as(self) / self.sym_numel() when that is supported
+      self: (ones_like(self) * grad) / self.sym_numel()
+      result: auto_linear
+
+- name: mean.dim(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  self: mean_backward(grad, self.sym_sizes(), dim, self.sym_numel(), keepdim)
+  result: auto_linear
+
+- name: median(Tensor self) -> Tensor
+  self: evenly_distribute_backward(grad, self, result)
+  result: evenly_read_jvp(self_t, self_p, result)
+
+- name: nanmedian(Tensor self) -> Tensor
+  self: evenly_distribute_backward(grad, self, result)
+  result: evenly_read_jvp(self_t, self_p, result)
+
+# This is in theory incorrect in the following case:
+#   sorted list: [..., a, b, b, ..., b, b, c, ...] with median = b and the value
+#                            |                     at middle position of the
+#                            |                     list between two `b`s. E.g.,
+#                            |
+#                            ^the middle position
+# The gradient exists and is essentially 0 in this case.
+#
+# In case where the middle position is at the boundary of `b` range, e.g.,
+#   sorted list: [..., a, b, b, ..., b, b, c, ...]
+#                                       |
+#                                       ^the middle position
+# The backward implementation is correct in the sense that it returns the
+# subgradient on one side.
+- name: median.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  self: value_selecting_reduction_backward_symint(grad, dim, indices, self.sym_sizes(), keepdim)
+  values: gather_with_keepdimed_indices(self_t, dim, indices, keepdim)
+
+- name: nanmedian.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  self: value_selecting_reduction_backward_symint(grad, dim, indices, self.sym_sizes(), keepdim)
+  values: gather_with_keepdimed_indices(self_t, dim, indices, keepdim)
+
+- name: min.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
+  self: value_selecting_reduction_backward_symint(grad, dim, indices, self.sym_sizes(), keepdim)
+  values: gather_with_keepdimed_indices(self_t, dim, indices, keepdim)
+
+- name: min(Tensor self) -> Tensor
+  self: evenly_distribute_backward(grad, self, result)
+  result: evenly_read_jvp(self_t, self_p, result)
+
+- name: minimum(Tensor self, Tensor other) -> Tensor
+  self: at::where(self == other, grad / 2, grad).masked_fill_(self > other, 0)
+  other: at::where(self == other, grad / 2, grad).masked_fill_(self < other, 0)
+  result: other_t + at::where(self_p == other_p, at::scalar_tensor(0.5, result.options()), (self_p < other_p).to(result.scalar_type())) * (self_t - other_t)
+
+- name: fmin(Tensor self, Tensor other) -> Tensor
+  self: grad.masked_fill((self <= other).logical_or_(other.isnan()).logical_not_(), 0)
+  other: grad.masked_fill((self <= other).logical_or_(other.isnan()), 0)
+  result: other_t + (self_p <= other_p).logical_or_(other_p.isnan()) * (self_t - other_t)
+
+- name: amax(Tensor self, int[1] dim=[], bool keepdim=False) -> Tensor
+  self: scale_grad_by_count(restore_reduced_dims(grad, dim, keepdim), restore_reduced_dims(result, dim, keepdim) == self, dim)
+  result: amaxamin_jvp(self_p, self_t, result, dim, keepdim)
+
+- name: amin(Tensor self, int[1] dim=[], bool keepdim=False) -> Tensor
+  self: scale_grad_by_count(restore_reduced_dims(grad, dim, keepdim), restore_reduced_dims(result, dim, keepdim) == self, dim)
+  result: amaxamin_jvp(self_p, self_t, result, dim, keepdim)
+
+- name: mm(Tensor self, Tensor mat2) -> Tensor
+  self: mm_mat1_backward(grad, mat2, self.sym_sizes(), self.sym_strides(), self.layout(), 1)
+  mat2: mm_mat2_backward(grad, self, mat2.sym_sizes(), mat2.sym_strides(), mat2.layout(), 1)
+  result: at::mm(self_t, mat2_p) + at::mm(self_p, mat2_t)
+
+- name: _grouped_mm(Tensor self, Tensor mat2, Tensor? offs=None, Tensor? bias=None, ScalarType? out_dtype=None) -> Tensor
+  self: _grouped_mm_mat1_backward(grad, mat2, self.sym_sizes(), self.sym_strides(), self.layout(), offs, 1)
+  mat2: _grouped_mm_mat2_backward(grad, self, mat2.sym_sizes(), mat2.sym_strides(), mat2.layout(), offs, 1)
+
+- name: mode(Tensor self, int dim=-1, bool keepdim=False) -> (Tensor values, Tensor indices)
+  self: value_selecting_reduction_backward_symint(grad, dim, indices, self.sym_sizes(), keepdim)
+  values: gather_with_keepdimed_indices(self_t, dim, indices, keepdim)
+
+- name: mul.Tensor(Tensor self, Tensor other) -> Tensor
+  self: mul_tensor_backward(grad, other, self.scalar_type())
+  other: mul_tensor_backward(grad, self, other.scalar_type())
+  result: other_t * self_p + self_t * other_p
+
+- name: mul.Scalar(Tensor self, Scalar other) -> Tensor
+  self: mul_tensor_backward(grad, other, self.scalar_type())
+  result: self_t * other
+
+- name: mv(Tensor self, Tensor vec) -> Tensor
+  self: grad.ger(vec.conj())
+  vec: self.conj().t().mv(grad)
+  result: mv(self_t, vec_p) + mv(self_p, vec_t)
+
+- name: mvlgamma(Tensor self, int p) -> Tensor
+  self: mvlgamma_backward(grad, self, p)
+  result: auto_element_wise
+
+- name: nan_to_num(Tensor self, float? nan=None, float? posinf=None, float? neginf=None) -> Tensor
+  self: grad * at::isfinite(self)
+  result: auto_element_wise
+
+- name: native_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor)
+  input, weight, bias: "grad.defined() ? native_batch_norm_backward(grad, input, weight, running_mean, running_var, result1, result2, training, eps, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+  result0: batch_norm_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, running_mean, running_var, result1, result2, training, eps)
+
+- name: _native_batch_norm_legit(Tensor input, Tensor? weight, Tensor? bias, Tensor(a!) running_mean, Tensor(b!) running_var, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor)
+  input, weight, bias: "grad.defined() ? native_batch_norm_backward(grad, input, weight, running_mean, running_var, result1, result2, training, eps, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+  result0: batch_norm_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, running_mean, running_var, result1, result2, training, eps)
+
+- name: _native_batch_norm_legit_no_training(Tensor input, Tensor? weight, Tensor? bias, Tensor running_mean, Tensor running_var, float momentum, float eps) -> (Tensor, Tensor, Tensor)
+  input, weight, bias: "grad.defined() ? native_batch_norm_backward(grad, input, weight, running_mean, running_var, result1, result2, /*training=*/false, eps, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+  result0: batch_norm_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, running_mean, running_var, result1, result2, /*training=*/false, eps)
+
+- name: _native_batch_norm_legit.no_stats(Tensor input, Tensor? weight, Tensor? bias, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor)
+  input, weight, bias: "grad.defined() ? native_batch_norm_backward(grad, input, weight, Tensor(), Tensor(), result1, result2, training, eps, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+  result0: batch_norm_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, Tensor(), Tensor(), result1, result2, training, eps)
+
+- name: native_batch_norm_backward(Tensor grad_out, Tensor input, Tensor? weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_invstd, bool train, float eps, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  input, weight, grad_out: batchnorm_double_backward(input, weight, grads[0], grads[1], grads[2], grad_out, running_mean, running_var, train, eps, save_mean, save_invstd, grad_input_mask)
+  save_mean: not_implemented("native_batch_norm_backward save_mean")
+  save_invstd: not_implemented("native_batch_norm_backward save_invstd")
+
+- name: native_layer_norm(Tensor input, SymInt[] normalized_shape, Tensor? weight, Tensor? bias, float eps) -> (Tensor, Tensor, Tensor)
+  input, weight, bias: "grad.defined() ? native_layer_norm_backward_symint(grad, input, normalized_shape, result1, result2, weight, bias, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+  result0: layer_norm_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, result1, result2, normalized_shape)
+
+- name: native_layer_norm_backward(Tensor grad_out, Tensor input, SymInt[] normalized_shape, Tensor mean, Tensor rstd, Tensor? weight, Tensor? bias, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  input, weight, grad_out: layer_norm_double_backward(input, weight, grads[0], grads[1], grads[2], grad_out, mean, rstd, normalized_shape, grad_input_mask)
+  bias: Tensor()
+  mean: not_implemented("native_layer_norm_backward mean")
+  rstd: not_implemented("native_layer_norm_backward rstd")
+
+- name: _fused_rms_norm(Tensor input, int[] normalized_shape, Tensor? weight, float? eps) -> (Tensor, Tensor)
+  input, weight: "GradMode::is_enabled() || grads[1].defined() ? infinitely_differentiable_native_rms_norm_backward(grads[0], grads[1], input, normalized_shape, result1, weight, grad_input_mask) : (grads[0].defined() ? _fused_rms_norm_backward(grads[0], input, normalized_shape, result1, weight, grad_input_mask) : std::tuple<Tensor, Tensor>())"
+  result0: rms_norm_jvp(input_p, input_t, weight_p, weight_t, result1, normalized_shape)
+  result1: rms_norm_rstd_jvp(input_p, input_t, result1, normalized_shape)
+
+- name: native_group_norm(Tensor input, Tensor? weight, Tensor? bias, SymInt N, SymInt C, SymInt HxW, int group, float eps) -> (Tensor, Tensor, Tensor)
+  input, weight, bias: "GradMode::is_enabled() || grads[1].defined() || grads[2].defined() ? infinitely_differentiable_native_group_norm_backward(grads[0], grads[1], grads[2], input, result1, result2, weight, N, C, HxW, group, eps, grad_input_mask) : (grads[0].defined() ? native_group_norm_backward_symint(grads[0].device().is_xpu() ? grads[0] : grads[0].contiguous(grads[0].device().is_cpu() ? input.suggest_memory_format() : c10::MemoryFormat::Contiguous), input.device().is_xpu() ? input : input.contiguous(input.device().is_cpu() ? input.suggest_memory_format() : c10::MemoryFormat::Contiguous), result1, result2, weight, N, C, HxW, group, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>())"
+  result0: group_norm_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, result1, result2, group)
+  result1: group_norm_mean_jvp(input_t, result1, group)
+  result2: group_norm_invstd_jvp(input_p, input_t, result1, result2, group)
+
+- name: ne_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
+  self: zeros_like(self)
+  result: self_t.zero_()
+
+- name: ne_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
+  self: zeros_like(self)
+  other: zeros_like(other)
+  result: self_t.zero_()
+
+- name: neg(Tensor self) -> Tensor
+  self: grad.neg()
+  result: auto_element_wise
+
+- name: _batch_norm_with_update(Tensor input, Tensor? weight, Tensor? bias, Tensor(a!) running_mean, Tensor(b!) running_var, float momentum, float eps) -> (Tensor, Tensor, Tensor, Tensor)
+  input, weight, bias: "grad.defined() ? batch_norm_backward(grad, input, weight, running_mean, running_var, result1, result2, /*update*/true, eps, grad_input_mask, retain_variables ? result3.clone() : result3) : std::tuple<Tensor, Tensor, Tensor>()"
+  result0: batch_norm_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, running_mean, running_var, result1, result2, true, eps)
+
+- name: _batch_norm_no_update(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, float momentum, float eps) -> (Tensor, Tensor, Tensor, Tensor)
+  input, weight, bias: "grad.defined() ? batch_norm_backward(grad, input, weight, running_mean, running_var, result1, result2, /*update*/false, eps, grad_input_mask, retain_variables ? result3.clone() : result3) : std::tuple<Tensor, Tensor, Tensor>()"
+  result0: batch_norm_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, running_mean, running_var, result1, result2, false, eps)
+
+- name: batch_norm_backward(Tensor grad_out, Tensor input, Tensor weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_var, bool update, float eps, bool[3] output_mask, Tensor reserve) -> (Tensor, Tensor, Tensor)
+  input, weight, grad_out: batchnorm_double_backward(input, weight, grads[0], grads[1], grads[2], grad_out, running_mean, running_var, update, eps, save_mean, save_var, grad_input_mask)
+  save_mean: not_implemented("batch_norm_backward save_mean")
+  save_var: not_implemented("batch_norm_backward save_var")
+  reserve: not_implemented("batch_norm_backward reserve")
+
+- name: nextafter(Tensor self, Tensor other) -> Tensor
+  self: not_implemented("nextafter")
+  other: not_implemented("nextafter")
+
+- name: norm.Scalar(Tensor self, Scalar p=2) -> Tensor
+  self: norm_backward(grad, self, p, result)
+  result: norm_jvp(self_p, self_t, p, result)
+
+- name: norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> Tensor
+  self: norm_backward(grad, self, p, result, dim, keepdim)
+  result: norm_jvp(self_p, self_t, p, result, dim, keepdim)
+
+- name: norm.ScalarOpt_dtype(Tensor self, Scalar? p, *, ScalarType dtype) -> Tensor
+  self: norm_backward(grad, self.to(grad.scalar_type()), p, result)
+  result: norm_jvp(self_p, self_t, p, result)
+
+- name: norm.ScalarOpt_dim_dtype(Tensor self, Scalar? p, int[1] dim, bool keepdim, *, ScalarType dtype) -> Tensor
+  self: norm_backward(grad, self.to(grad.scalar_type()), p, result, dim, keepdim)
+  result: norm_jvp(self_p, self_t, p, result, dim, keepdim)
+
+- name: linalg_vector_norm(Tensor self, Scalar ord=2, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  self: linalg_vector_norm_backward(grad, self, ord, result, dim, keepdim)
+  result: linalg_vector_norm_jvp(self_p, self_t, ord, result, dim, keepdim)
+
+- name: _pdist_forward(Tensor self, float p=2) -> Tensor
+  self: _pdist_backward(grad, self, p, result)
+
+- name: _pdist_backward(Tensor grad, Tensor self, float p, Tensor pdist) -> Tensor
+  grad: not_implemented("_pdist_backward")
+  self: not_implemented("_pdist_backward")
+  pdist: not_implemented("_pdist_backward")
+
+- name: _euclidean_dist(Tensor x1, Tensor x2) -> Tensor
+  x1, x2: _euclidean_dist_backward(grad, x1, x2, result)
+
+- name: _cdist_forward(Tensor x1, Tensor x2, float p, int? compute_mode) -> Tensor
+  x1: _cdist_backward(grad.contiguous(), x1, x2, p, result)
+  x2: _cdist_backward(grad.mT().contiguous(), x2, x1, p, result.mT().contiguous())
+
+- name: _cdist_backward(Tensor grad, Tensor x1, Tensor x2, float p, Tensor cdist) -> Tensor
+  grad: not_implemented("_cdist_backward")
+  x1: not_implemented("_cdist_backward")
+  x2: not_implemented("_cdist_backward")
+  cdist: not_implemented("_cdist_backward")
+
+- name: normal_(Tensor(a!) self, float mean=0, float std=1, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.zero_()
+
+- name: normal.Tensor_float(Tensor mean, float std=1, *, Generator? generator=None) -> Tensor
+  mean: at::zeros_symint(mean.sym_sizes(), grad.options())
+  result: auto_element_wise
+
+- name: normal.float_Tensor(float mean, Tensor std, *, Generator? generator=None) -> Tensor
+  std: at::zeros_symint(std.sym_sizes(), grad.options())
+  result: auto_element_wise
+
+- name: normal.Tensor_Tensor(Tensor mean, Tensor std, *, Generator? generator=None) -> Tensor
+  mean: at::zeros_symint(mean.sym_sizes(), grad.options())
+  std: at::zeros_symint(std.sym_sizes(), grad.options())
+  result: zeros_like(mean_t)
+
+- name: linalg_householder_product(Tensor input, Tensor tau) -> Tensor
+  input, tau: householder_product_backward(grad, result, input, tau)
+  result: householder_product_jvp(input_t, tau_t, result, input_p, tau_p)
+
+- name: ormqr(Tensor self, Tensor input2, Tensor input3, bool left=True, bool transpose=False) -> Tensor
+  self, input2, input3: ormqr_backward(grad, result, self, input2, input3, left, transpose, grad_input_mask)
+
+- name: permute(Tensor(a) self, int[] dims) -> Tensor(a)
+  self: permute_backwards(grad, dims)
+  result: auto_linear
+
+- name: poisson(Tensor self, Generator? generator=None) -> Tensor
+  self: zeros_like(self)
+  result: auto_element_wise
+
+- name: pow.Tensor_Scalar(Tensor self, Scalar exponent) -> Tensor
+  self: pow_backward(grad, self, exponent)
+  result: auto_element_wise
+
+- name: pow.Tensor_Tensor(Tensor self, Tensor exponent) -> Tensor
+  self: pow_backward_self(grad, self, exponent)
+  exponent: pow_backward_exponent(grad, self, exponent, result)
+  result: (pow_backward_self(self_t.conj(), self_p, exponent_p) + pow_backward_exponent(exponent_t.conj(), self_p, exponent_p, result)).conj()
+
+- name: pow.Scalar(Scalar self, Tensor exponent) -> Tensor
+  exponent: pow_backward_exponent(grad, self, exponent, result)
+  result: auto_element_wise
+
+- name: prod(Tensor self, *, ScalarType? dtype=None) -> Tensor
+  self: prod_backward(grad, self.to(grad.scalar_type()), result)
+  result: (prod_backward(at::ones({}, result.options()).expand_as(result), self_p.to(result.scalar_type()), result) * self_t.conj()).sum().conj()
+
+- name: prod.dim_int(Tensor self, int dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  self: prod_backward(grad, self.to(grad.scalar_type()), result, dim, keepdim)
+  result: (prod_backward(at::ones({}, result.options()).expand_as(result), self_p.to(result.scalar_type()), result, dim, keepdim) * self_t.conj()).sum(dim, keepdim).conj()
+
+- name: put(Tensor self, Tensor index, Tensor source, bool accumulate=False) -> Tensor
+  self: "accumulate ? grad : grad.put(index, zeros_like(source), false)"
+  index: non_differentiable
+  source: grad.take(index).reshape_as(source)
+  result: self_t.put(index, source_t, accumulate)
+
+- name: linalg_qr(Tensor A, str mode='reduced') -> (Tensor Q, Tensor R)
+  A: linalg_qr_backward(grad_Q, grad_R, Q, R, mode)
+  Q, R: linalg_qr_jvp(A_t, Q, R, mode)
+
+- name: rad2deg(Tensor self) -> Tensor
+  self: rad2deg_backward(grad)
+  result: auto_element_wise
+
+- name: random_.from(Tensor(a!) self, int from, int? to, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.zero_()
+
+- name: random_.to(Tensor(a!) self, int to, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.zero_()
+
+- name: random_(Tensor(a!) self, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.zero_()
+
+- name: reciprocal(Tensor self) -> Tensor
+  self: -grad * (result * result).conj()
+  result: auto_element_wise
+
+- name: remainder.Scalar(Tensor self, Scalar other) -> Tensor
+  self: grad
+  result: auto_element_wise
+
+- name: remainder.Tensor(Tensor self, Tensor other) -> Tensor
+  self: grad
+  other: -grad * self.div(other, /*rounding_mode=*/"floor")
+  result: self_t - other_t * self_p.div(other_p, /*rounding_mode=*/"floor")
+
+- name: renorm(Tensor self, Scalar p, int dim, Scalar maxnorm) -> Tensor
+  self: renorm_backward(grad, self, p, dim, maxnorm)
+  result: renorm_jvp(self_p, self_t, p, dim, maxnorm)
+
+- name: repeat(Tensor self, SymInt[] repeats) -> Tensor
+  self: repeat_backward(grad, repeats, self.sym_sizes())
+  result: auto_linear
+
+- name: special_entr(Tensor self) -> Tensor
+  self: grad * (-(1 + self.log()))
+  result: auto_element_wise
+
+- name: special_ndtri(Tensor self) -> Tensor
+  self: grad * std::sqrt(2 * M_PI) * (result.square() / 2).exp()
+  result: auto_element_wise
+
+- name: special_log_ndtr(Tensor self) -> Tensor
+  self: grad / std::sqrt(2 * M_PI) * (result + self.pow(2) / 2).neg().exp()
+  result: auto_element_wise
+
+# [Note: Sometimes view derivatives]
+# The following situation applies to other operations as well.
+# TODO: This note is only referenced by to_dense and to_sparse*. Make
+# this more generic if it's been referenced more than once.
+#
+# DO NOT define a backward for reshape!
+# reshape is special in that it sometimes returns a view, and sometimes not.
+# Defining a backward will make codegen spit out the forward call as
+#     as_variable(baseType->reshape(self)),
+# making it impossible (hard) to detect when it is actually a view.
+# - name: reshape(Tensor self, IntArrayRef shape)
+
+- name: _reshape_alias(Tensor(a) self, SymInt[] size, SymInt[] stride) -> Tensor(a)
+  self: grad.reshape_symint(self.sym_sizes())
+  result: auto_linear
+
+- name: round(Tensor self) -> Tensor
+  self: zeros_like(grad)
+  result: auto_element_wise
+
+- name: round.decimals(Tensor self, *, int decimals) -> Tensor
+  self: zeros_like(grad)
+  result: auto_element_wise
+
+- name: rsqrt(Tensor self) -> Tensor
+  self: -0.5 * grad * result.pow(3).conj()
+  result: auto_element_wise
+
+- name: scatter.src(Tensor self, int dim, Tensor index, Tensor src) -> Tensor
+  self: grad.scatter(dim, index, 0)
+  index: non_differentiable
+  src: grad.gather(dim, index)
+  result: self_t.scatter(dim, index, src_t)
+
+- name: scatter.value(Tensor self, int dim, Tensor index, Scalar value) -> Tensor
+  self: grad.scatter(dim, index, 0)
+  index: non_differentiable
+  result: self_t.scatter(dim, index, 0)
+
+- name: scatter_add(Tensor self, int dim, Tensor index, Tensor src) -> Tensor
+  self: grad
+  index: non_differentiable
+  src: grad.gather(dim, index)
+  result: scatter_add(self_t, dim, index, src_t)
+
+- name: select.int(Tensor(a) self, int dim, SymInt index) -> Tensor(a)
+  dispatch:
+    Default:
+      self: select_backward_symint(grad, self.sym_sizes(), dim, index)
+      result: auto_linear
+    AutogradNestedTensor:
+      self: _nested_select_backward_symint(grad, self, dim, index)
+
+- name: select_backward(Tensor grad_output, SymInt[] input_sizes, int dim, SymInt index) -> Tensor
+  grad_output: grad.select_symint(dim, index)
+  result: auto_linear
+
+- name: sigmoid(Tensor self) -> Tensor
+  self: sigmoid_backward(grad, result)
+  result: auto_element_wise
+
+- name: logit(Tensor self, float? eps=None) -> Tensor
+  self: "GradMode::is_enabled() ? infinitely_differentiable_logit_backward(grad, self, eps) : logit_backward(grad, self, eps)"
+  result: auto_element_wise
+
+- name: sign(Tensor self) -> Tensor
+  self: zeros_like(grad)
+  result: auto_element_wise
+
+- name: sgn(Tensor self) -> Tensor
+  self: sgn_backward(self, grad, result)
+  # Cannot use auto_element_wise here because the Jacobian is *not* Hermitian (in fact, it is symmetric)
+  # The function is not holomorphic, so there's no reason for its Jacobian to be Hermitian
+  # auto_element_wise has a name that's a bit deceiving in the complex case
+  result: sgn_backward(self_p, self_t, result)
+
+- name: sin(Tensor self) -> Tensor
+  self: grad * self.cos().conj()
+  result: auto_element_wise
+
+- name: sinc(Tensor self) -> Tensor
+  self: sinc_backward(grad, self)
+  result: auto_element_wise
+
+- name: sinh(Tensor self) -> Tensor
+  self: grad * self.cosh().conj()
+  result: auto_element_wise
+
+- name: slice.Tensor(Tensor(a) self, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1) -> Tensor(a)
+  self: slice_backward_wrapper(grad, self.sym_sizes(), dim, start, end, step)
+  result: auto_linear
+
+- name: slice_backward(Tensor grad_output, SymInt[] input_sizes, int dim, SymInt start, SymInt end, SymInt step) -> Tensor
+  grad_output: grad.slice_symint(dim, start, end, step)
+  result: auto_linear
+
+- name: slice_inverse(Tensor(a) self, Tensor src, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1) -> Tensor(a)
+  self: grad.slice_symint(dim, start, end, step)
+  src: slice_scatter_symint(grad, zeros_like(self), dim, start, end, step)
+  result: auto_linear
+
+- name: slice_scatter(Tensor self, Tensor src, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1) -> Tensor
+  self: slice_scatter_symint(grad, zeros_like(src), dim, start, end, step)
+  src: grad.slice_symint(dim, start, end, step)
+  result: auto_linear
+
+- name: select_scatter(Tensor self, Tensor src, int dim, SymInt index) -> Tensor
+  self: select_scatter_symint(grad, zeros_like(src), dim, index)
+  src: grad.select_symint(dim, index)
+  result: auto_linear
+
+- name: diagonal_scatter(Tensor self, Tensor src, int offset=0, int dim1=0, int dim2=1) -> Tensor
+  self: diagonal_scatter(grad, zeros_like(src), offset, dim1, dim2)
+  src: grad.diagonal(offset, dim1, dim2)
+  result: auto_linear
+
+- name: as_strided_scatter(Tensor self, Tensor src, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor
+  self: as_strided_scatter_backward(grad, TensorGeometry(self), TensorGeometry(src), size, stride, storage_offset)
+  # See Note [as_strided_scatter backward support]
+  src: grad.contiguous().as_strided_symint(size, stride, storage_offset)
+  result: auto_linear
+
+- name: _linalg_solve_ex(Tensor A, Tensor B, *, bool left=True, bool check_errors=False) -> (Tensor result, Tensor LU, Tensor pivots, Tensor info)
+  A, B: linalg_solve_backward(grad, result, A, LU, pivots, left, grad_input_mask[1])
+  result: "linalg_solve_jvp(A_t, B_t, result, LU, pivots, left, A_p.is_contiguous() && !A_p.is_complex())"
+  output_differentiability: [True, False, False, False]  # LU is an auxiliary tensor not exposed to the user
+
+- name: sort(Tensor self, int dim=-1, bool descending=False) -> (Tensor values, Tensor indices)
+  self: value_selecting_reduction_backward_symint(grad, dim, indices, self.sym_sizes(), true)
+  output_differentiability: [True, False]
+  values: gather_with_keepdimed_indices(self_t, dim, indices, true)
+
+- name: sort.stable(Tensor self, *, bool? stable, int dim=-1, bool descending=False) -> (Tensor values, Tensor indices)
+  self: value_selecting_reduction_backward_symint(grad, dim, indices, self.sym_sizes(), true)
+  output_differentiability: [True, False]
+  values: gather_with_keepdimed_indices(self_t, dim, indices, true)
+
+- name: split.Tensor(Tensor(a -> *) self, SymInt split_size, int dim=0) -> Tensor(a)[]
+  self: split_backward(grads, split_size, dim, self.sym_sizes(), self.options())
+  result: auto_linear
+
+- name: unsafe_split.Tensor(Tensor self, SymInt split_size, int dim=0) -> Tensor[]
+  self: split_backward(grads, split_size, dim, self.sym_sizes(), self.options())
+  result: auto_linear
+
+- name: split_with_sizes(Tensor(a -> *) self, SymInt[] split_sizes, int dim=0) -> Tensor(a)[]
+  dispatch:
+    Default:
+      self: split_with_sizes_backward(grads, split_sizes, dim, self.sym_sizes(), self.options())
+      result: auto_linear
+    AutogradNestedTensor:
+      self: _nested_split_with_sizes_backward(grads, split_sizes, dim, at::native::get_nested_tensor_impl(self)->get_nested_sizes(), self.options())
+
+- name: unsafe_split_with_sizes(Tensor self, SymInt[] split_sizes, int dim=0) -> Tensor[]
+  self: split_with_sizes_backward(grads, split_sizes, dim, self.sym_sizes(), self.options())
+  result: auto_linear
+
+- name: sqrt(Tensor self) -> Tensor
+  self: grad / (2 * result.conj())
+  result: auto_element_wise
+
+- name: squeeze(Tensor(a) self) -> Tensor(a)
+  self: unsqueeze_to(grad, self.sym_sizes())
+  result: auto_linear
+
+- name: squeeze.dim(Tensor(a) self, int dim) -> Tensor(a)
+  dispatch:
+    Default:
+      self: unsqueeze_to(grad, dim, self.sym_sizes())
+      result: auto_linear
+    AutogradNestedTensor:
+      self: grad.unsqueeze(dim)
+
+- name: squeeze.dims(Tensor(a) self, int[] dim) -> Tensor(a)
+  dispatch:
+    Default:
+      self: unsqueeze_to(grad, dim, self.sym_sizes())
+      result: auto_linear
+    AutogradNestedTensor:
+      self: unsqueeze_multiple(grad, dim, self.dim())
+
+- name: squeeze_(Tensor(a!) self) -> Tensor(a!)
+  self: unsqueeze_to(grad, self.sym_sizes())
+  result: auto_linear
+
+- name: squeeze_.dim(Tensor(a!) self, int dim) -> Tensor(a!)
+  self: unsqueeze_to(grad, dim, self.sym_sizes())
+  result: auto_linear
+
+- name: squeeze_.dims(Tensor(a!) self, int[] dim) -> Tensor(a!)
+  self: unsqueeze_to(grad, dim, self.sym_sizes())
+  result: auto_linear
+
+- name: std.correction(Tensor self, int[1]? dim=None, *, Scalar? correction=None, bool keepdim=False) -> Tensor
+  self: std_backward(result, grad, self, dim, correction, keepdim)
+  # pointwise (variance) + sum + sqrt
+  result: (at::real(var_backward(self_t.conj(), self_p, dim, correction, true).sum(dim.value_or(IntArrayRef({})), keepdim)) / (2. * result)).masked_fill_(result == 0, 0)
+
+- name: std_mean.correction(Tensor self, int[1]? dim=None, *, Scalar? correction=None, bool keepdim=False) -> (Tensor, Tensor)
+  self: std_mean_backward(grads[0], grads[1], self, result0, dim, correction, keepdim)
+  result0: (at::real(var_backward(self_t.conj(), self_p, dim, correction, true).sum(dim.value_or(IntArrayRef({})), keepdim)) / (2. * result0)).masked_fill_(result0 == 0, 0)
+  # linear
+  result1: mean(self_t, dim.value_or(IntArrayRef({})), keepdim)
+
+- name: sub.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
+  self: handle_r_to_c(self.scalar_type(), grad)
+  other: handle_r_to_c(other.scalar_type(), maybe_multiply(-grad, alpha.conj()))
+  result: self_t - maybe_multiply(other_t, alpha)
+
+- name: sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor
+  self: handle_r_to_c(self.scalar_type(), grad)
+  result: auto_element_wise
+
+- name: rsub.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
+  self: handle_r_to_c(self.scalar_type(), maybe_multiply(-grad, alpha.conj()))
+  other: handle_r_to_c(other.scalar_type(), grad)
+  result: -maybe_multiply(self_t, alpha) + other_t
+
+- name: rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor
+  self: handle_r_to_c(self.scalar_type(), maybe_multiply(-grad, alpha.conj()))
+  result: auto_element_wise
+
+- name: sum(Tensor self, *, ScalarType? dtype=None) -> Tensor
+  dispatch:
+    Default:
+      self: grad.expand_symint(self.sym_sizes())
+      result: auto_linear
+    AutogradNestedTensor:
+      # TODO: replace this with grad.expand_as(self) when that is supported
+      self: ones_like(self) * grad
+      result: auto_linear
+
+- name: sum.dim_IntList(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  dispatch:
+    Default:
+      self: sum_backward(grad, self.sym_sizes(), dim, keepdim)
+      result: auto_linear
+    AutogradNestedTensor:
+      # TODO: replace this function once semantics for nested tensor expand have been settled on
+      self: _nested_sum_backward(grad, self, dim, keepdim)
+
+- name: nansum(Tensor self, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor
+  self: nansum_backward(grad.to(self.scalar_type()), self, dim, keepdim)
+  result: at::where(self_p.isnan(), 0, self_t).sum(dim, keepdim, dtype)
+
+# We never call _linalg_svd with compute_uv=False in an autograd context, so we don't even consider it here
+- name: _linalg_svd(Tensor A, bool full_matrices=False, bool compute_uv=True, *, str? driver=None) -> (Tensor U, Tensor S, Tensor Vh)
+  A: "svd_backward(full_matrices && grad_U.defined() ? grad_U.narrow_symint(-1, 0, S.sym_size(-1)) : grad_U,
+                   grad_S,
+                   full_matrices && grad_Vh.defined() ? grad_Vh.narrow_symint(-2, 0, S.sym_size(-1)) : grad_Vh,
+                   full_matrices ? U.narrow_symint(-1, 0, S.sym_size(-1)) : U,
+                   S,
+                   full_matrices ? Vh.narrow_symint(-2, 0, S.sym_size(-1)) : Vh)"
+  U, S, Vh: linalg_svd_jvp(A_t, U, S, Vh, full_matrices)
+
+- name: _linalg_eigh(Tensor A, str UPLO="L", bool compute_v=True) -> (Tensor eigenvalues, Tensor eigenvectors)
+  A: linalg_eig_backward(grads[0], grads[1], eigenvalues, eigenvectors, /*is_hermitian=*/true)
+  eigenvalues, eigenvectors: linalg_eig_jvp(A_t, eigenvalues, eigenvectors, /*is_hermitian=*/true)
+
+- name: linalg_eig(Tensor self) -> (Tensor eigenvalues, Tensor eigenvectors)
+  self: handle_r_to_c(self.scalar_type(), linalg_eig_backward(grads[0], grads[1], eigenvalues, eigenvectors, /*is_hermitian=*/false))
+  eigenvalues, eigenvectors: linalg_eig_jvp(self_t, eigenvalues, eigenvectors, /*is_hermitian=*/false)
+
+- name: t(Tensor(a) self) -> Tensor(a)
+  self: grad.t()
+  result: auto_linear
+
+- name: t_(Tensor(a!) self) -> Tensor(a!)
+  self: grad.t()
+  result: auto_linear
+
+- name: one_hot(Tensor self, int num_classes=-1) -> Tensor
+  self: non_differentiable
+
+- name: flip(Tensor self, int[] dims) -> Tensor
+  self: grad.flip(dims)
+  result: auto_linear
+
+- name: roll(Tensor self, SymInt[1] shifts, int[1] dims=[]) -> Tensor
+  self: grad.roll_symint(fmap(reverse_list_symint(shifts), [](c10::SymInt i){return -i;}), reverse_list(dims))
+  result: auto_linear
+
+- name: rot90(Tensor self, int k=1, int[] dims=[0,1]) -> Tensor
+  self: grad.rot90(-k, dims)
+  result: auto_linear
+
+- name: take(Tensor self, Tensor index) -> Tensor
+  self: take_backward(grad, self, index)
+  index: non_differentiable
+  result: auto_linear
+
+- name: tan(Tensor self) -> Tensor
+  self: grad * (1 + result.pow(2)).conj()
+  result: auto_element_wise
+
+- name: tanh(Tensor self) -> Tensor
+  self: tanh_backward(grad, result)
+  result: auto_element_wise
+
+- name: topk(Tensor self, SymInt k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
+  self: value_selecting_reduction_backward_symint(grad, dim, indices, self.sym_sizes(), true)
+  output_differentiability: [True, False]
+  values: gather(self_t, dim, indices)
+
+- name: trace(Tensor self) -> Tensor
+  self: trace_backward_symint(grad, self.sym_sizes())
+  result: auto_linear
+
+- name: transpose.int(Tensor(a) self, int dim0, int dim1) -> Tensor(a)
+  self: grad.transpose(dim0, dim1)
+  result: auto_linear
+
+- name: transpose_(Tensor(a!) self, int dim0, int dim1) -> Tensor(a!)
+  self: grad.transpose(dim0, dim1)
+  result: auto_linear
+
+- name: triangular_solve(Tensor self, Tensor A, bool upper=True, bool transpose=False, bool unitriangular=False) -> (Tensor solution, Tensor cloned_coefficient)
+  self, A: triangular_solve_backward(grad_solution, grad_cloned_coefficient, self, A, solution, upper, transpose, unitriangular, grad_input_mask)
+  solution: triangular_solve_jvp(solution, A_p, A_t, self_t, upper, transpose, unitriangular)
+  cloned_coefficient: A_t
+
+- name: linalg_solve_triangular(Tensor self, Tensor B, *, bool upper, bool left=True, bool unitriangular=False) -> Tensor
+  self, B: linalg_solve_triangular_backward(grad, self, result, upper, left, unitriangular, grad_input_mask)
+  result: linalg_solve_triangular_forward_AD(self_t, B_t, self_p, result, upper, left, unitriangular)
+
+- name: tril(Tensor self, SymInt diagonal=0) -> Tensor
+  self: grad.tril_symint(diagonal)
+  result: auto_linear
+
+- name: triu(Tensor self, SymInt diagonal=0) -> Tensor
+  self: grad.triu_symint(diagonal)
+  result: auto_linear
+
+- name: trunc(Tensor self) -> Tensor
+  self: zeros_like(grad)
+  result: auto_element_wise
+
+- name: hash_tensor(Tensor self, int[1] dim=[], *, bool keepdim=False, int mode=0) -> Tensor
+  output_differentiability: [False]
+
+# DO NOT define a backward for to_dense
+# See [Note: Sometimes view derivatives]
+# - name: to_dense(Tensor self, ScalarType? dtype=None, *, bool? masked_grad=None) -> Tensor
+#
+- name: _to_dense(Tensor self, ScalarType? dtype=None, bool? masked_grad=None) -> Tensor
+  self: to_dense_backward(grad, self, masked_grad)
+
+# DO NOT define a backward for to_sparse.sparse_dim
+# See [Note: Sometimes view derivatives]
+# - name: to_sparse.sparse_dim(Tensor self, int sparse_dim) -> Tensor
+#
+- name: _to_sparse.sparse_dim(Tensor self, int sparse_dim) -> Tensor
+  self: to_sparse_backward(grad, self.layout(), self.sym_blocksize())
+
+# DO NOT define a backward for to_sparse
+# See [Note: Sometimes view derivatives]
+# - name: to_sparse(Tensor self, *, Layout? layout=None, int[2]? blocksize=None, int? dense_dim=None) -> Tensor
+#
+- name: _to_sparse(Tensor self, *, Layout? layout=None, int[2]? blocksize=None, int? dense_dim=None) -> Tensor
+  self: to_sparse_backward(grad, self.layout(), self.sym_blocksize())
+
+# DO NOT define a backward for to_sparse_csr
+# See [Note: Sometimes view derivatives]
+# - name: to_sparse_csr(Tensor self, int? dense_dim=None) -> Tensor
+#
+- name: _to_sparse_csr(Tensor self, int? dense_dim=None) -> Tensor
+  self: to_sparse_backward(grad, self.layout(), self.sym_blocksize())
+
+# DO NOT define a backward for to_sparse_csc
+# See [Note: Sometimes view derivatives]
+# - name: to_sparse_csc(Tensor self, int? dense_dim=None) -> Tensor
+#
+- name: _to_sparse_csc(Tensor self, int? dense_dim=None) -> Tensor
+  self: to_sparse_backward(grad, self.layout(), self.sym_blocksize())
+
+# DO NOT define a backward for to_sparse_bsr
+# See [Note: Sometimes view derivatives]
+# - name: to_sparse_bsr(Tensor self, int[2] blocksize, int? dense_dim=None) -> Tensor
+#
+- name: _to_sparse_bsr(Tensor self, int[2] blocksize, int? dense_dim=None) -> Tensor
+  self: to_sparse_backward(grad, self.layout(), self.sym_blocksize())
+
+# DO NOT define a backward for to_sparse_bsc
+# See [Note: Sometimes view derivatives]
+# - name: to_sparse_bsc(Tensor self, int[2] blocksize, int? dense_dim=None) -> Tensor
+#
+- name: _to_sparse_bsc(Tensor self, int[2] blocksize, int? dense_dim=None) -> Tensor
+  self: to_sparse_backward(grad, self.layout(), self.sym_blocksize())
+
+- name: to_mkldnn(Tensor self, ScalarType? dtype=None) -> Tensor
+  self: to_mkldnn_backward(grad, self)
+
+- name: unfold(Tensor(a) self, int dimension, int size, int step) -> Tensor(a)
+  self: unfold_backward_symint(grad, self.sym_sizes(), dimension, size, step)
+  result: auto_linear
+
+- name: unfold_backward(Tensor grad_in, SymInt[] input_sizes, int dim, int size, int step) -> Tensor
+  grad_in: grad.unfold(dim, size, step)
+  result: auto_linear
+
+- name: uniform_(Tensor(a!) self, float from=0, float to=1, *, Generator? generator=None) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: self_t.zero_()
+
+- name: _unique(Tensor self, bool sorted=True, bool return_inverse=False) -> (Tensor, Tensor)
+  output_differentiability: [True, False]
+  self: not_implemented("_unique")
+
+- name: unique_dim(Tensor self, int dim, bool sorted=True, bool return_inverse=False, bool return_counts=False) -> (Tensor, Tensor, Tensor)
+  output_differentiability: [True, False, False]
+  self: not_implemented("unique_dim")
+
+- name: unique_consecutive(Tensor self, bool return_inverse=False, bool return_counts=False, int? dim=None) -> (Tensor, Tensor, Tensor)
+  output_differentiability: [True, False, False]
+  self: not_implemented("unique_consecutive")
+
+- name: unique_dim_consecutive(Tensor self, int dim, bool return_inverse=False, bool return_counts=False) -> (Tensor, Tensor, Tensor)
+  output_differentiability: [True, False, False]
+  self: not_implemented("unique_dim_consecutive")
+
+- name: _unique2(Tensor self, bool sorted=True, bool return_inverse=False, bool return_counts=False) -> (Tensor, Tensor, Tensor)
+  output_differentiability: [True, False, False]
+  self: not_implemented("_unique2")
+
+- name: _unsafe_view(Tensor self, SymInt[] size) -> Tensor
+  self: grad.reshape_symint(self.sym_sizes())
+  result: auto_linear
+
+- name: lift(Tensor self) -> Tensor
+  self: grad
+  result: auto_linear
+
+- name: lift_fresh(Tensor(a) self) -> Tensor(a)
+  self: grad
+  result: auto_linear
+
+- name: unsqueeze(Tensor(a) self, int dim) -> Tensor(a)
+  self: grad.squeeze(dim)
+  result: auto_linear
+
+- name: unsqueeze_(Tensor(a!) self, int dim) -> Tensor(a!)
+  self: grad.squeeze(dim)
+  result: auto_linear
+
+- name: var.correction(Tensor self, int[1]? dim=None, *, Scalar? correction=None, bool keepdim=False) -> Tensor
+  self: var_backward(grad, self, dim, correction, keepdim)
+  # pointwise + sum
+  result: at::real(var_backward(self_t.conj(), self_p, dim, correction, true).sum(dim.value_or(IntArrayRef({})), keepdim))
+
+- name: var_mean.correction(Tensor self, int[1]? dim=None, *, Scalar? correction=None, bool keepdim=False) -> (Tensor, Tensor)
+  self: var_mean_backward(grads[0], grads[1], self, dim, correction, keepdim)
+  result0: at::real(var_backward(self_t.conj(), self_p, dim, correction, true).sum(dim.value_or(IntArrayRef({})), keepdim))
+  # linear
+  result1: mean(self_t, dim.value_or(IntArrayRef({})), keepdim)
+
+- name: view(Tensor(a) self, SymInt[] size) -> Tensor(a)
+  dispatch:
+    Default:
+      self: grad.reshape_symint(self.sym_sizes())
+      result: auto_linear
+    AutogradNestedTensor:
+      self: grad.reshape_as(self)
+      result: auto_linear
+
+- name: view.dtype(Tensor(a) self, ScalarType dtype) -> Tensor(a)
+  output_differentiability: [False]
+
+- name: view_as_real(Tensor(a) self) -> Tensor(a)
+  self: at::view_as_complex(grad.contiguous()) # gx0 + 1j * gx1
+  result: at::view_as_real(self_t)
+
+- name: view_as_complex(Tensor(a) self) -> Tensor(a)
+  self: at::view_as_real(grad.contiguous().resolve_conj()) # [gx, gy]
+  result: at::view_as_complex(self_t)
+
+- name: where.self(Tensor condition, Tensor self, Tensor other) -> Tensor
+  condition: non_differentiable
+  self: where(condition, grad, 0)
+  other: where(condition, 0, grad)
+  result: where(condition, self_t, other_t)
+
+# weight_norm_cuda_interface_backward does not have an explicitly defined derivative, so if we do happen
+# to be running backward with create_graph=True, fall back to a backward function that uses
+# differentiable ops.
+- name: _weight_norm_interface(Tensor v, Tensor g, int dim=0) -> (Tensor, Tensor)
+  v, g: "grad.defined() ? (GradMode::is_enabled() ? _weight_norm_differentiable_backward(grad.contiguous(), v, g, result1, dim) : _weight_norm_interface_backward(grad.contiguous(), v, g, result1, dim)) : std::tuple<Tensor, Tensor>()"
+
+- name: zero_(Tensor(a!) self) -> Tensor(a!)
+  self: zeros_like(grad)
+  result: auto_linear
+
+- name: sparse_mask(Tensor self, Tensor mask) -> Tensor
+  self: sparse_mask_backward(grad, mask, self.layout())
+  mask: non_differentiable
+
+- name: _sparse_coo_tensor_with_dims_and_tensors(int sparse_dim, int dense_dim, SymInt[] size, Tensor indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False, bool? is_coalesced=None) -> Tensor
+  indices: non_differentiable
+  values: grad.sparse_mask(result)._values()
+
+- name: sparse_compressed_tensor.comp_plain_value_size(Tensor compressed_indices, Tensor plain_indices, Tensor values, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor
+  compressed_indices: non_differentiable
+  plain_indices: non_differentiable
+  # TODO: remove to_dense after gh-107381 is fixed
+  values: grad.to_dense().sparse_mask(result).values()
+
+- name: _sparse_sum.dim(Tensor self, int[1] dim) -> Tensor
+  self: at::_sparse_sum_backward(grad, self, dim)
+
+- name: _standard_gamma(Tensor self, Generator? generator=None) -> Tensor
+  self: grad * _standard_gamma_grad(self, result)
+
+- name: _standard_gamma_grad(Tensor self, Tensor output) -> Tensor
+  self: not_implemented("_standard_gamma_grad")
+
+- name: values(Tensor(a) self) -> Tensor(a)
+  dispatch:
+    Default:
+      self: values_backward(grad, self)
+    AutogradNestedTensor:
+      self: at::_nested_view_from_buffer(grad.contiguous(), self._nested_tensor_size(), self._nested_tensor_strides(), self._nested_tensor_storage_offsets())
+
+# Why is _values() not differentiable?
+# See NOTE [ Sparse: autograd and API ]
+- name: _values(Tensor(a) self) -> Tensor(a)
+  output_differentiability: [False]
+
+# NN
+- name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor
+  i1, i2, i3: "_trilinear_backward(grad,
+               wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]),
+               wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]),
+               wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]),
+               expand1, expand2, expand3, sumdim, grad_input_mask)"
+  result: "_trilinear(i1_t, i2_p, i3_p, expand1, expand2, expand3, sumdim, unroll_dim) +
+           _trilinear(i1_p, i2_t, i3_p, expand1, expand2, expand3, sumdim, unroll_dim) +
+           _trilinear(i1_p, i2_p, i3_t, expand1, expand2, expand3, sumdim, unroll_dim)"
+
+- name: constant_pad_nd(Tensor self, SymInt[] pad, Scalar value=0) -> Tensor
+  self: constant_pad_nd_backward(grad, pad)
+  result: constant_pad_nd_symint(self_t, pad, 0)
+
+- name: binary_cross_entropy(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean) -> Tensor
+  self: binary_cross_entropy_backward(grad, self, target, weight, reduction)
+  target: binary_cross_entropy_target_backward(grad, self, target, weight, reduction)
+  result: "apply_loss_reduction(
+               binary_cross_entropy_backward(self_t, self_p, target_p, weight, at::Reduction::None)
+             + binary_cross_entropy_target_backward(target_t, self_p, target_p, weight, at::Reduction::None),
+           reduction)"
+
+- name: binary_cross_entropy_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean) -> Tensor
+  self: binary_cross_entropy_double_backward(grad_output, grad, self, target, weight, reduction)
+  target: binary_cross_entropy_double_backward_target(grad, grad_output, self, target, weight, reduction)
+  grad_output: binary_cross_entropy_double_backward_grad_output(grad, self, target, weight, reduction)
+  result: " binary_cross_entropy_double_backward(grad_output_p, self_t, self_p, target_p, weight, reduction)
+          + binary_cross_entropy_double_backward_target(target_t, grad_output_p, self_p, target_p, weight, reduction)
+          + binary_cross_entropy_double_backward_grad_output(grad_output_t, self_p, target_p, weight, reduction)"
+
+- name: binary_cross_entropy_with_logits(Tensor self, Tensor target, Tensor? weight=None, Tensor? pos_weight=None, int reduction=Mean) -> Tensor
+  self: binary_cross_entropy_with_logits_backward(grad, self, target, weight, pos_weight, reduction)
+  target: binary_cross_entropy_with_logits_target_backward(grad, self, target, weight, pos_weight, reduction)
+  result: "apply_loss_reduction(
+               binary_cross_entropy_with_logits_backward(self_t, self_p, target_p, weight, pos_weight, at::Reduction::None)
+             + binary_cross_entropy_with_logits_target_backward(target_t, self_p, target_p, weight, pos_weight, at::Reduction::None),
+           reduction)"
+
+- name: embedding(Tensor weight, Tensor indices, SymInt padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> Tensor
+  indices: non_differentiable
+  weight: embedding_backward_symint(grad, indices, weight.sym_size(0), padding_idx, scale_grad_by_freq, sparse)
+  result: auto_linear
+
+- name: embedding_dense_backward(Tensor grad_output, Tensor indices, SymInt num_weights, SymInt padding_idx, bool scale_grad_by_freq) -> Tensor
+  grad_output: embedding_dense_double_backward_symint(grad, indices, padding_idx)
+  indices: non_differentiable
+  result: auto_linear
+
+- name: _embedding_bag(Tensor weight, Tensor indices, Tensor offsets, bool scale_grad_by_freq=False, int mode=0, bool sparse=False, Tensor? per_sample_weights=None, bool include_last_offset=False, int padding_idx=-1) -> (Tensor, Tensor, Tensor, Tensor)
+  indices: non_differentiable
+  offsets: non_differentiable
+  weight: _embedding_bag_backward_symint(grad, indices, offsets, result1, result2, result3, weight.sym_size(0), scale_grad_by_freq, mode, sparse, per_sample_weights, padding_idx)
+  per_sample_weights: _embedding_bag_per_sample_weights_backward(grad, weight, indices, offsets, result1, mode, padding_idx)
+
+- name: _embedding_bag_backward(Tensor grad, Tensor indices, Tensor offsets, Tensor offset2bag, Tensor bag_size, Tensor maximum_indices, SymInt num_weights, bool scale_grad_by_freq, int mode, bool sparse, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor
+  grad: not_implemented("_embedding_bag_backward")
+  indices: non_differentiable
+  offsets: non_differentiable
+  offset2bag: non_differentiable
+  bag_size: non_differentiable
+  maximum_indices: non_differentiable
+  per_sample_weights: not_implemented("_embedding_bag_backward")
+
+- name: _embedding_bag_dense_backward(Tensor grad, Tensor indices, Tensor offset2bag, Tensor bag_size, Tensor maximum_indices, SymInt num_weights, bool scale_grad_by_freq, int mode, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor
+  grad: not_implemented("_embedding_bag_dense_backward")
+  indices: non_differentiable
+  offset2bag: non_differentiable
+  bag_size: non_differentiable
+  maximum_indices: non_differentiable
+  per_sample_weights: not_implemented("_embedding_bag_dense_backward")
+
+- name: embedding_renorm_(Tensor(a!) self, Tensor indices, float max_norm, float norm_type) -> Tensor(a!)
+  indices: non_differentiable
+  self: not_implemented("embedding_renorm")
+
+- name: mse_loss(Tensor self, Tensor target, int reduction=Mean) -> Tensor
+  self: mse_loss_backward(grad, self, target, reduction)
+  target: mse_loss_backward(grad, target, self, reduction)
+  result: apply_loss_reduction(mse_loss_backward(self_t.conj(), self_p, target_p, at::Reduction::None).conj() + mse_loss_backward(target_t.conj(), target_p, self_p, at::Reduction::None).conj(), reduction)
+
+- name: multi_margin_loss(Tensor self, Tensor target, Scalar p=1, Scalar margin=1, Tensor? weight=None, int reduction=Mean) -> Tensor
+  self: multi_margin_loss_backward(grad, self, target, p, margin, weight, reduction)
+  target: non_differentiable
+
+- name: multilabel_margin_loss_forward(Tensor self, Tensor target, int reduction) -> (Tensor output, Tensor is_target)
+  self: multilabel_margin_loss_backward(grad, self, target, reduction, is_target)
+  target: non_differentiable
+
+- name: nll_loss_forward(Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index) -> (Tensor output, Tensor total_weight)
+  self: nll_loss_backward_symint(grad, self, target, weight, reduction, ignore_index, total_weight)
+  target: non_differentiable
+  output: std::get<0>(nll_loss_forward_symint(self_t, target, weight, reduction, ignore_index))
+
+- name: nll_loss2d_forward(Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index) -> (Tensor output, Tensor total_weight)
+  self: nll_loss2d_backward_symint(grad, self, target, weight, reduction, ignore_index, total_weight)
+  target: non_differentiable
+  output: std::get<0>(nll_loss2d_forward_symint(self_t, target, weight, reduction, ignore_index))
+
+- name: smooth_l1_loss(Tensor self, Tensor target, int reduction=Mean, float beta=1.0) -> Tensor
+  self: smooth_l1_loss_backward(grad, self, target, reduction, beta)
+  target: smooth_l1_loss_backward(grad, target, self, reduction, beta)
+  result: apply_loss_reduction(smooth_l1_loss_backward(self_t.conj(), self_p, target_p, at::Reduction::None, beta).conj() + smooth_l1_loss_backward(target_t.conj(), target_p, self_p, at::Reduction::None, beta).conj(), reduction)
+
+- name: huber_loss(Tensor self, Tensor target, int reduction=Mean, float delta=1.0) -> Tensor
+  self: huber_loss_backward(grad, self, target, reduction, delta)
+  target: huber_loss_backward(grad, target, self, reduction, delta)
+  result: apply_loss_reduction(huber_loss_backward(self_t.conj(), self_p, target_p, at::Reduction::None, delta).conj() + huber_loss_backward(target_t.conj(), target_p, self_p, at::Reduction::None, delta).conj(), reduction)
+
+- name: soft_margin_loss(Tensor self, Tensor target, int reduction=Mean) -> Tensor
+  self: soft_margin_loss_backward(grad, self, target, reduction)
+  result: apply_loss_reduction(soft_margin_loss_backward(self_t.conj(), self_p, target, at::Reduction::None).conj(), reduction)
+
+- name: relu(Tensor self) -> Tensor
+  self: threshold_backward(grad, result, 0)
+  result: auto_element_wise
+
+- name: silu(Tensor self) -> Tensor
+  self: "GradMode::is_enabled() ? infinitely_differentiable_silu_backward(grad, self) : silu_backward(grad, self)"
+  result: auto_element_wise
+
+- name: mish(Tensor self) -> Tensor
+  self: "GradMode::is_enabled() ? infinitely_differentiable_mish_backward(grad, self) : mish_backward(grad, self)"
+  result: auto_element_wise
+
+- name: elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> Tensor
+  self: elu_backward(grad, alpha, scale, input_scale, /* is_result */ false, self)
+  result: auto_element_wise
+
+- name: elu_(Tensor(a!) self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> Tensor(a!)
+  self: elu_backward(grad, alpha, scale, input_scale, /* is_result */ true, result)
+  result: self_t.copy_(elu_backward(original_self_t, alpha, scale, input_scale, /* is_result */ true, result))
+
+- name: celu(Tensor self, Scalar alpha=1.0) -> Tensor
+  self: elu_backward(grad, alpha, 1, 1.0/alpha.toFloat(), /* is_result */ false, self)
+  result: auto_element_wise
+
+- name: celu_(Tensor(a!) self, Scalar alpha=1.0) -> Tensor(a!)
+  self: elu_backward(grad, alpha, 1, 1.0/alpha.toFloat(), /* is_result */ true, result)
+  result: self_t.copy_(elu_backward(original_self_t, alpha, 1, 1.0/alpha.toFloat(), /* is_result */ true, result))
+
+- name: gelu(Tensor self, *, str approximate='none') -> Tensor
+  self: gelu_backward(grad, self, approximate)
+  result: auto_element_wise
+
+- name: gelu_backward(Tensor grad_output, Tensor self, *, str approximate='none') -> Tensor
+  grad_output: gelu_backward(grad, self, approximate)
+  self: gelu_double_backward(grad, grad_output, self, approximate)
+  result: gelu_backward(grad_output_t, self_p, approximate) + gelu_double_backward(self_t, grad_output_p, self_p, approximate)
+
+- name: glu(Tensor self, int dim=-1) -> Tensor
+  # TODO: glu_backward can benefit from forward result,
+  # and forward ad/forward over reverse ad for that matter
+  self: glu_backward(grad, self, dim)
+  result: glu_jvp(result, self_p, self_t, dim)
+
+- name: hardshrink(Tensor self, Scalar lambd=0.5) -> Tensor
+  self: hardshrink_backward(grad, self, lambd)
+  result: auto_element_wise
+
+- name: hardshrink_backward(Tensor grad_out, Tensor self, Scalar lambd) -> Tensor
+  grad_out: hardshrink_backward(grad, self, lambd)
+  self: zeros_like(grad)
+  result: at::where((self_p > lambd).logical_or(self_p < -lambd), grad_out_t, at::zeros({}, result.options()).expand_as(result))
+
+- name: hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> Tensor
+  self: hardtanh_backward(grad, self, min_val, max_val)
+  result: auto_element_wise
+
+- name: leaky_relu(Tensor self, Scalar negative_slope=0.01) -> Tensor
+  self: leaky_relu_backward(grad, self, negative_slope, false)
+  result: auto_element_wise
+
+- name: leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> Tensor(a!)
+  self: leaky_relu_backward(grad, result, negative_slope, true)
+  result: self_t.copy_(leaky_relu_backward(original_self_t.conj(), result, negative_slope, true).conj())
+
+- name: log_sigmoid_forward(Tensor self) -> (Tensor output, Tensor buffer)
+  self: log_sigmoid_backward(grad, self, buffer)
+  output: log_sigmoid_backward(self_t.conj(), self_p, buffer).conj()
+  output_differentiability: [True, False]
+
+- name: _log_softmax(Tensor self, int dim, bool half_to_float) -> Tensor
+  self: _log_softmax_backward_data(grad, result, dim, self.scalar_type())
+  result: self_t - logsumexp_jvp(self_p, self_t, {dim}, true)
+
+- name: _sparse_log_softmax(Tensor self, int dim, bool half_to_float) -> Tensor
+  self: _sparse_log_softmax_backward_data(grad, result, dim, self)
+
+- name: _masked_softmax(Tensor self, Tensor mask, int? dim=None, int? mask_type=None) -> Tensor
+  self: _masked_softmax_backward(grad, result, mask, dim)
+  mask: non_differentiable
+
+- name: _prelu_kernel(Tensor self, Tensor weight) -> Tensor
+  self, weight: "grad.defined() ? _prelu_kernel_backward(grad, self, weight) : std::tuple<Tensor, Tensor>()"
+  result: at::where(self_p >= 0, self_t, weight_p * self_t + weight_t * self_p)
+
+- name: _prelu_kernel_backward(Tensor grad_output, Tensor self, Tensor weight) -> (Tensor, Tensor)
+  grad_output: "grads[0].defined() ?
+                (grads[1].defined() ? at::where(self >= 0, grads[0], grads[0] * weight + grads[1] * self)
+                                    : at::where(self >= 0, grads[0], grads[0] * weight))
+                                    : at::where(self >= 0, at::zeros({}, grad_output.options()), grads[1] * self)"
+  self: "grads[1].defined() ? at::where(self >= 0, at::zeros({}, self.options()), grad_output * grads[1]) : zeros_like(self)"
+  weight: "grads[0].defined() ? at::where(self >= 0, at::zeros({}, weight.options()), grad_output * grads[0]) : zeros_like(self)"
+  result0: at::where(self_p >= 0, grad_output_t, grad_output_t * weight_p + grad_output_p * weight_t)
+  result1: at::where(self_p >= 0, at::zeros({}, self_p.options()), grad_output_p * self_t + grad_output_t * self_p)
+
+- name: rrelu_with_noise(Tensor self, Tensor(b!) noise, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None) -> Tensor
+  self: rrelu_with_noise_backward(grad, self, noise, lower, upper, training, false)
+  result: auto_element_wise
+
+- name: rrelu_with_noise_(Tensor(a!) self, Tensor(b!) noise, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None) -> Tensor(a!)
+  self: rrelu_with_noise_backward(grad, result, noise, lower, upper, training, true)
+
+- name: rrelu_with_noise_functional(Tensor self, Tensor noise, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None) -> (Tensor, Tensor noise_out)
+  noise: non_differentiable
+  self: rrelu_with_noise_backward(grad, self, noise, lower, upper, training, false)
+
+- name: _softmax(Tensor self, int dim, bool half_to_float) -> Tensor
+  self: _softmax_backward_data(grad, result, dim, self.scalar_type())
+  result: result * (self_t - logsumexp_jvp(self_p, self_t, {dim}, true))
+
+- name: _sparse_softmax(Tensor self, int dim, bool half_to_float) -> Tensor
+  self: _sparse_softmax_backward_data(grad, result, dim, self)
+
+- name: _sparse_sparse_matmul(Tensor self, Tensor other) -> Tensor
+  self: sparse_sparse_matmul_backward(grad, self, other, 0)
+  other: sparse_sparse_matmul_backward(grad, self, other, 1)
+
+- name: softplus(Tensor self, Scalar beta=1, Scalar threshold=20) -> Tensor
+  self: softplus_backward(grad, self, beta, threshold)
+  result: auto_element_wise
+
+- name: softshrink(Tensor self, Scalar lambd=0.5) -> Tensor
+  self: softshrink_backward(grad, self, lambd)
+  result: auto_element_wise
+
+- name: threshold(Tensor self, Scalar threshold, Scalar value) -> Tensor
+  self: threshold_backward(grad, self, threshold)
+  result: auto_element_wise
+
+- name: threshold_(Tensor(a!) self, Scalar threshold, Scalar value) -> Tensor(a!)
+  self: threshold_backward(grad, self, threshold)
+  result: self_t.copy_(threshold_backward(self_t.conj(), original_self_p, threshold).conj())
+
+- name: reflection_pad1d(Tensor self, SymInt[2] padding) -> Tensor
+  self: reflection_pad1d_backward_symint(grad, self, padding)
+  result: auto_linear
+
+- name: reflection_pad2d(Tensor self, SymInt[4] padding) -> Tensor
+  self: reflection_pad2d_backward_symint(grad, self, padding)
+  result: auto_linear
+
+- name: reflection_pad3d(Tensor self, SymInt[6] padding) -> Tensor
+  self: reflection_pad3d_backward_symint(grad, self, padding)
+  result: auto_linear
+
+- name: replication_pad1d(Tensor self, SymInt[2] padding) -> Tensor
+  self: replication_pad1d_backward_symint(grad, self, padding)
+  result: auto_linear
+
+- name: replication_pad2d(Tensor self, SymInt[4] padding) -> Tensor
+  self: replication_pad2d_backward_symint(grad, self, padding)
+  result: auto_linear
+
+- name: replication_pad3d(Tensor self, SymInt[6] padding) -> Tensor
+  self: replication_pad3d_backward_symint(grad, self, padding)
+  result: auto_linear
+
+- name: upsample_linear1d(Tensor self, SymInt[1] output_size, bool align_corners, float? scales=None) -> Tensor
+  self: upsample_linear1d_backward_symint(grad, output_size, self.sym_sizes(), align_corners, scales)
+  result: auto_linear
+
+- name: upsample_bilinear2d(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  self: upsample_bilinear2d_backward_symint(grad, output_size, self.sym_sizes(), align_corners, scales_h, scales_w)
+  result: auto_linear
+
+- name: _upsample_bilinear2d_aa(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  self: _upsample_bilinear2d_aa_backward_symint(grad, output_size, self.sym_sizes(), align_corners, scales_h, scales_w)
+  result: auto_linear
+
+- name: upsample_bicubic2d(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  self: upsample_bicubic2d_backward_symint(grad, output_size, self.sym_sizes(), align_corners, scales_h, scales_w)
+  result: auto_linear
+
+- name: _upsample_bicubic2d_aa(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  self: _upsample_bicubic2d_aa_backward_symint(grad, output_size, self.sym_sizes(), align_corners, scales_h, scales_w)
+  result: auto_linear
+
+- name: upsample_trilinear3d(Tensor self, SymInt[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  self: upsample_trilinear3d_backward_symint(grad, output_size, self.sym_sizes(), align_corners, scales_d, scales_h, scales_w)
+  result: auto_linear
+
+- name: upsample_nearest1d(Tensor self, SymInt[1] output_size, float? scales=None) -> Tensor
+  self: upsample_nearest1d_backward_symint(grad, output_size, self.sym_sizes(), scales)
+  result: auto_linear
+
+- name: _upsample_nearest_exact1d(Tensor self, SymInt[1] output_size, float? scales=None) -> Tensor
+  self: _upsample_nearest_exact1d_backward_symint(grad, output_size, self.sym_sizes(), scales)
+  result: auto_linear
+
+- name: upsample_nearest2d(Tensor self, SymInt[2] output_size, float? scales_h=None, float? scales_w=None) -> Tensor
+  self: upsample_nearest2d_backward_symint(grad, output_size, self.sym_sizes(), scales_h, scales_w)
+  result: auto_linear
+
+- name: _upsample_nearest_exact2d(Tensor self, SymInt[2] output_size, float? scales_h=None, float? scales_w=None) -> Tensor
+  self: _upsample_nearest_exact2d_backward_symint(grad, output_size, self.sym_sizes(), scales_h, scales_w)
+  result: auto_linear
+
+- name: upsample_nearest3d(Tensor self, SymInt[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  self: upsample_nearest3d_backward_symint(grad, output_size, self.sym_sizes(), scales_d, scales_h, scales_w)
+  result: auto_linear
+
+- name: _upsample_nearest_exact3d(Tensor self, SymInt[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  self: _upsample_nearest_exact3d_backward_symint(grad, output_size, self.sym_sizes(), scales_d, scales_h, scales_w)
+  result: auto_linear
+
+- name: pixel_shuffle(Tensor self, int upscale_factor) -> Tensor
+  self: pixel_unshuffle(grad, upscale_factor)
+  result: auto_linear
+
+- name: pixel_unshuffle(Tensor self, int downscale_factor) -> Tensor
+  self: pixel_shuffle(grad, downscale_factor)
+  result: auto_linear
+
+- name: channel_shuffle(Tensor self, SymInt groups) -> Tensor
+  self: channel_shuffle_symint(grad, grad.sym_size(1) / groups)
+  result: auto_linear
+
+- name: _adaptive_avg_pool2d(Tensor self, SymInt[2] output_size) -> Tensor
+  self: _adaptive_avg_pool2d_backward(grad, self)
+  result: auto_linear
+
+- name: _adaptive_avg_pool3d(Tensor self, SymInt[3] output_size) -> Tensor
+  self: _adaptive_avg_pool3d_backward(grad, self)
+  result: auto_linear
+
+- name: adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
+  self: adaptive_max_pool2d_backward(grad, self, result1)
+  result0: gather(self_t.flatten(-2), -1, result1.flatten(-2)).view_as(result1)
+  output_differentiability: [True, False]
+
+- name: adaptive_max_pool3d(Tensor self, int[3] output_size) -> (Tensor, Tensor)
+  self: adaptive_max_pool3d_backward(grad, self, result1)
+  result0: gather(self_t.flatten(-3), -1, result1.flatten(-3)).view_as(result1)
+  output_differentiability: [True, False]
+
+- name: avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> Tensor
+  self: avg_pool2d_backward(grad, self, kernel_size, stride, padding, ceil_mode, count_include_pad, divisor_override)
+  result: auto_linear
+
+- name: avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> Tensor
+  self: avg_pool3d_backward(grad, self, kernel_size, stride, padding, ceil_mode, count_include_pad, divisor_override)
+  result: auto_linear
+
+- name: fractional_max_pool2d(Tensor self, int[2] kernel_size, int[2] output_size, Tensor random_samples) -> (Tensor, Tensor)
+  self: fractional_max_pool2d_backward(grad, self, kernel_size, output_size, result1)
+  result0: gather(self_t.flatten(-2), -1, result1.flatten(-2)).view_as(result1)
+  output_differentiability: [True, False]
+
+- name: fractional_max_pool3d(Tensor self, int[3] kernel_size, int[3] output_size, Tensor random_samples) -> (Tensor, Tensor)
+  self: fractional_max_pool3d_backward(grad, self, kernel_size, output_size, result1)
+  result0: gather(self_t.flatten(-3), -1, result1.flatten(-3)).view_as(result1)
+  output_differentiability: [True, False]
+
+- name: linear(Tensor input, Tensor weight, Tensor? bias=None) -> Tensor
+  input, weight, bias: "grad.defined() ? linear_backward(input, grad, weight, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: linear_backward(Tensor self, Tensor grad_output, Tensor weight, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  self, grad_output, weight: linear_double_backward(grads, self, grad_output, weight)
+
+#mps
+- name: max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> Tensor
+  self: max_pool2d_backward(grad, self, kernel_size, stride, padding, dilation, ceil_mode)
+
+- name: _mps_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups) -> Tensor
+  self, weight, bias: "grad.defined() ? mps_convolution_backward_symint(self, grad, weight, padding, stride, dilation, groups, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: mps_convolution_backward(Tensor self, Tensor grad_output, Tensor weight, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  grad_output, self, weight: _convolution_double_backward_symint(grads[0], grads[1], grads[2], grad_output, weight, self, stride, padding, dilation, false, std::vector<c10::SymInt>(padding.size(), 0), groups, grad_input_mask)
+
+- name: max_pool2d_with_indices(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> (Tensor, Tensor)
+  self: max_pool2d_with_indices_backward(grad, self, kernel_size, stride, padding, dilation, ceil_mode, result1)
+  result0: gather(self_t.flatten(-2), -1, result1.flatten(-2)).view_as(result1)
+  output_differentiability: [True, False]
+
+- name: max_pool3d_with_indices(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, int[3] dilation=1, bool ceil_mode=False) -> (Tensor, Tensor)
+  self: max_pool3d_with_indices_backward(grad, self, kernel_size, stride, padding, dilation, ceil_mode, result1)
+  result0: gather(self_t.flatten(-3), -1, result1.flatten(-3)).view_as(result1)
+  output_differentiability: [True, False]
+
+- name: max_unpool2d(Tensor self, Tensor indices, SymInt[2] output_size) -> Tensor
+  self: max_pool_double_backward(grad, indices, 2)
+  indices: non_differentiable
+  result: auto_linear
+
+- name: max_unpool3d(Tensor self, Tensor indices, SymInt[3] output_size, int[3] stride, int[3] padding) -> Tensor
+  self: max_pool_double_backward(grad, indices, 3)
+  indices: non_differentiable
+  result: auto_linear
+
+- name: convolution(Tensor input, Tensor weight, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups) -> Tensor
+  input, weight, bias: "grad.defined() ? convolution_backward_symint(grad, input, weight, bias->sym_sizes(), stride, padding, dilation, transposed, output_padding, groups, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+  result: convolution_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, stride, padding, dilation, transposed, output_padding, groups)
+
+# TorchScript serializes calls to _convolution so this entry is present until that is changed to use convolution.
+# Note that the benchmark, deterministic, cudnn_enabled, and allow_tf32 flags are queried from the global context
+# by convolution_backward instead of being passed along from the forward pass.
+- name: _convolution(Tensor input, Tensor weight, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> Tensor
+  input, weight, bias: "grad.defined() ? convolution_backward_symint(grad, input, weight, bias->sym_sizes(), stride, padding, dilation, transposed, output_padding, groups, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+  result: _convolution_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, stride, padding, dilation, transposed, output_padding, groups, benchmark, deterministic, cudnn_enabled, allow_tf32)
+
+- name: convolution_backward(Tensor grad_output, Tensor input, Tensor weight, SymInt[]? bias_sizes, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
+  grad_output, input, weight: _convolution_double_backward_symint(grads[0], grads[1], grads[2], grad_output, weight, input, stride, padding, dilation, transposed, output_padding, groups, grad_input_mask)
+  result0: std::get<0>(convolution_backward_symint(grad_output_p, input_p, weight_t, bias_sizes, stride, padding, dilation, transposed, output_padding, groups, {true, false, false})) + std::get<0>(convolution_backward_symint(grad_output_t, input_p, weight_p, bias_sizes, stride, padding, dilation, transposed, output_padding, groups, {true, false, false}))
+  result1: std::get<1>(convolution_backward_symint(grad_output_p, input_t, weight_p, bias_sizes, stride, padding, dilation, transposed, output_padding, groups, {false, true, false})) + std::get<1>(convolution_backward_symint(grad_output_t, input_p, weight_p, bias_sizes, stride, padding, dilation, transposed, output_padding, groups, {false, true, false}))
+  result2: convolution_backward_jvp_grad_bias(grad_output_t, result2)
+
+- name: convolution_overrideable(Tensor input, Tensor weight, Tensor? bias, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups) -> Tensor
+  input, weight, bias: "grad.defined() ? convolution_backward_overrideable_symint(grad, input, weight, stride, padding, dilation, transposed, output_padding, groups, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: convolution_backward_overrideable(Tensor grad_output, Tensor input, Tensor weight, SymInt[] stride, SymInt[] padding, SymInt[] dilation, bool transposed, SymInt[] output_padding, SymInt groups, bool[3] output_mask) -> (Tensor grad_input, Tensor grad_weight, Tensor grad_bias)
+  grad_output, input, weight: _convolution_double_backward_symint(grads[0], grads[1], grads[2], grad_output, weight, input, stride, padding, dilation, transposed, output_padding, groups, grad_input_mask)
+
+- name: slow_conv_transpose2d(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias=None, SymInt[2] stride=1, SymInt[2] padding=0, SymInt[2] output_padding=0, SymInt[2] dilation=1) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad, self, weight, bias->sym_sizes(), stride, padding, dilation, true, output_padding, 1, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: slow_conv_transpose3d(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias=None, SymInt[3] stride=1, SymInt[3] padding=0, SymInt[3] output_padding=0, SymInt[3] dilation=1) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad, self, weight, bias->sym_sizes(), stride, padding, dilation, true, output_padding, 1, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: _slow_conv2d_forward(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias, SymInt[2] stride, SymInt[2] padding) -> Tensor
+  self, weight, bias: "grad.defined() ? _slow_conv2d_backward_symint(grad, self, weight, kernel_size, stride, padding, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: _slow_conv2d_backward.output_mask(Tensor grad_output, Tensor self, Tensor weight, SymInt[2] kernel_size, SymInt[2] stride, SymInt[2] padding, bool[3] output_mask) -> (Tensor grad_input, Tensor grad_weight, Tensor grad_bias)
+  grad_output, self, weight: _convolution_double_backward_symint(grads[0], grads[1], grads[2], grad_output, weight, self, stride, padding, {{1, 1}}, false, {{0, 0}}, 1, grad_input_mask)
+
+- name: _conv_depthwise2d(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias, SymInt[2] stride, SymInt[2] padding, SymInt[2] dilation) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad.contiguous(), self, weight, bias->sym_sizes(), stride, padding, dilation, /*transposed=*/ false, /*output_padding=*/ {{0, 0}}, /*groups=*/ 1, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: conv_depthwise3d(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias, SymInt[3] stride, SymInt[3] padding, SymInt[3] dilation) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad.contiguous(), self, weight, bias->sym_sizes(), stride, padding, dilation, /*transposed=*/ false, /*output_padding=*/ {{0, 0, 0}}, /*groups=*/ 1, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: slow_conv3d_forward(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias, SymInt[3] stride, SymInt[3] padding) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad, self, weight, bias->sym_sizes(), stride, padding, /*dilation=*/ {{1, 1, 1}}, false, /*output_padding=*/ {{0, 0, 0}}, 1, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: slow_conv_dilated2d(Tensor self, Tensor weight, SymInt[2] kernel_size, Tensor? bias=None, SymInt[2] stride=1, SymInt[2] padding=0, SymInt[2] dilation=1) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad, self, weight, bias->sym_sizes(), stride, padding, dilation, false, std::vector<c10::SymInt>(padding.size(), 0), 1, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: slow_conv_dilated3d(Tensor self, Tensor weight, SymInt[3] kernel_size, Tensor? bias=None, SymInt[3] stride=1, SymInt[3] padding=0, SymInt[3] dilation=1) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad, self, weight, bias->sym_sizes(), stride, padding, dilation, false, std::vector<c10::SymInt>(padding.size(), 0), 1, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: col2im(Tensor self, SymInt[2] output_size, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride) -> Tensor
+  self: im2col(grad, kernel_size, dilation, padding, stride)
+  result: auto_linear
+
+- name: im2col(Tensor self, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride) -> Tensor
+  self: col2im_symint(grad, {self.sym_size(-2), self.sym_size(-1)}, kernel_size, dilation, padding, stride)
+  result: auto_linear
+
+- name: _adaptive_avg_pool2d_backward(Tensor grad_output, Tensor self) -> Tensor
+  grad_output: _adaptive_avg_pool2d_symint(grad, {grad_output.sym_size(-2), grad_output.sym_size(-1)})
+  self: zeros_like(self)
+  result: _adaptive_avg_pool2d_backward(grad_output_t, self_p)
+
+- name: _adaptive_avg_pool3d_backward(Tensor grad_output, Tensor self) -> Tensor
+  grad_output: _adaptive_avg_pool3d_symint(grad, { grad_output.sym_size(-3), grad_output.sym_size(-2), grad_output.sym_size(-1) })
+  self: zeros_like(self)
+  result: _adaptive_avg_pool3d_backward(grad_output_t, self_p)
+
+- name: adaptive_max_pool2d_backward(Tensor grad_output, Tensor self, Tensor indices) -> Tensor
+  grad_output: max_pool_double_backward(grad, indices, 2)
+  self: zeros_like(self)
+  result: auto_linear
+
+- name: adaptive_max_pool3d_backward(Tensor grad_output, Tensor self, Tensor indices) -> Tensor
+  grad_output: max_pool_double_backward(grad, indices, 3)
+  self: zeros_like(self)
+  result: auto_linear
+
+- name: avg_pool2d_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, bool ceil_mode, bool count_include_pad, int? divisor_override) -> Tensor
+  grad_output: avg_pool2d(grad, kernel_size, stride, padding, ceil_mode, count_include_pad, divisor_override)
+  self: zeros_like(self)
+  result: avg_pool2d_backward(grad_output_t, self_p, kernel_size, stride, padding, ceil_mode, count_include_pad, divisor_override)
+
+- name: avg_pool3d_backward(Tensor grad_output, Tensor self, int[3] kernel_size, int[3] stride, int[3] padding, bool ceil_mode, bool count_include_pad, int? divisor_override) -> Tensor
+  grad_output: avg_pool3d(grad, kernel_size, stride, padding, ceil_mode, count_include_pad, divisor_override)
+  self: zeros_like(self)
+  result: avg_pool3d_backward(grad_output_t, self_p, kernel_size, stride, padding, ceil_mode, count_include_pad, divisor_override)
+
+- name: elu_backward(Tensor grad_output, Scalar alpha, Scalar scale, Scalar input_scale, bool is_result, Tensor self_or_result) -> Tensor
+  grad_output: elu_backward(grad, alpha, scale, input_scale, is_result, self_or_result)
+  self_or_result: elu_double_backward(grad, grad_output, alpha, scale, input_scale, is_result, self_or_result)
+  result: elu_backward(grad_output_t, alpha, scale, input_scale, is_result, self_or_result_p) + elu_double_backward(self_or_result_t, grad_output_p, alpha, scale, input_scale, is_result, self_or_result_p)
+
+- name: fractional_max_pool2d_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] output_size, Tensor indices) -> Tensor
+  grad_output: max_pool_double_backward(grad, indices, 2)
+  self: zeros_like(self)
+  result: auto_linear
+
+- name: fractional_max_pool3d_backward(Tensor grad_output, Tensor self, int[3] kernel_size, int[3] output_size, Tensor indices) -> Tensor
+  grad_output: max_pool_double_backward(grad, indices, 3)
+  self: zeros_like(self)
+  result: auto_linear
+
+- name: glu_backward(Tensor grad_output, Tensor self, int dim) -> Tensor
+  grad_output: glu_double_backward_grad_output(grad, self, dim)
+  self: glu_double_backward(grad, grad_output, self, dim)
+  result: glu_backward_jvp(result, grad_output_p, self_p, grad_output_t, self_t, dim)
+
+- name: hardtanh_backward(Tensor grad_output, Tensor self, Scalar min_val, Scalar max_val) -> Tensor
+  grad_output: hardtanh_backward(grad, self, min_val, max_val)
+  self: zeros_like(grad)
+  result: at::where((self_p > min_val).logical_and(self_p < max_val), grad_output_t, at::zeros({}, result.options()).expand_as(result))
+
+- name: log_sigmoid_backward(Tensor grad_output, Tensor self, Tensor buffer) -> Tensor
+  grad_output: log_sigmoid_backward(grad, self, buffer)
+  self: log_sigmoid_double_backward(grad * grad_output, self)
+  result: log_sigmoid_backward(grad_output_t, self_p, buffer) + log_sigmoid_double_backward(self_t * grad_output_p, self_p)
+
+- name: _log_softmax_backward_data(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype) -> Tensor
+  grad_output: grad.to(output.dtype()) - (grad.to(output.dtype()) * output.exp()).sum(dim, true)
+  output: (-grad_output.sum(dim, true) * output.exp() * grad.to(output.dtype())).to(output.dtype())
+
+- name: leaky_relu_backward(Tensor grad_output, Tensor self, Scalar negative_slope, bool self_is_result) -> Tensor
+  # self_is_result is always false here since double backward call is an out-of-place call, self is input itself
+  grad_output: leaky_relu_backward(grad, self, negative_slope, false)
+  self: zeros_like(grad)
+  # leaky_relu_backward(grad_output, self, negative_slope, false)
+  # computes grad_output * at::where(self_p > 0, 1, negative_slope)
+  # so the jvp formula is the following:
+  # grad_output_t * at::where(self_p > 0, self_p.new_ones([]), negative_slope);
+  #
+  # leaky_relu_backward(grad_output, result, negative_slope, true)
+  # computes grad_output * at::where(result > 0, 1, negative_slope)
+  # under the assumption that `negative_slope` is positive (otherwise,
+  # it is not possible to compute the gradient).
+  #
+  # so the jvp formula is the following:
+  # grad_output_t * at::where(result_p > 0, result_p.new_ones([]), negative_slope);
+  # with the assumption that negative_slope is positive.
+  #
+  # Combined together that results in the following optimized kernel which
+  # also checks the assumption that negative_slope is positive when self_is_result
+  # is True:
+  result: leaky_relu_backward(grad_output_t, self_p, negative_slope, self_is_result)
+
+# This derivative is mps-only, and `error_for_max_pool2d_double_backward` just raises an error.
+- name: max_pool2d_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> Tensor
+  grad_output: error_for_max_pool2d_double_backward()
+  self: zeros_like(self)
+  result: auto_linear
+
+- name: max_pool2d_with_indices_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, int[2] dilation, bool ceil_mode, Tensor indices) -> Tensor
+  grad_output: max_pool_double_backward(grad, indices, 2)
+  self: zeros_like(self)
+  indices: non_differentiable
+  result: auto_linear
+
+- name: max_pool3d_with_indices_backward(Tensor grad_output, Tensor self, int[3] kernel_size, int[3] stride, int[3] padding, int[3] dilation, bool ceil_mode, Tensor indices) -> Tensor
+  grad_output: max_pool_double_backward(grad, indices, 3)
+  self: zeros_like(self)
+  indices: non_differentiable
+  result: auto_linear
+
+- name: mse_loss_backward(Tensor grad_output, Tensor self, Tensor target, int reduction) -> Tensor
+  grad_output: mse_loss_backward(grad, self, target, reduction)
+  self: mse_loss_double_backward(grad * grad_output, self, reduction)
+  target: -mse_loss_double_backward(grad * grad_output, target, reduction)
+  result: "  mse_loss_double_backward(self_t * grad_output_p, self_p, reduction)
+           - mse_loss_double_backward(target_t * grad_output_p, target_p, reduction)
+           + mse_loss_backward(grad_output_t, self_p, target_p, reduction)
+          "
+
+- name: nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, Tensor total_weight) -> Tensor
+  grad_output: nll_loss_symint(grad, target, weight, reduction, ignore_index)
+  self: zeros_like(grad)
+  target: non_differentiable
+
+- name: nll_loss2d_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, Tensor total_weight) -> Tensor
+  grad_output: nll_loss2d_symint(grad, target, weight, reduction, ignore_index)
+  self: zeros_like(grad)
+  target: non_differentiable
+
+- name: rrelu_with_noise_backward(Tensor grad_output, Tensor self, Tensor noise, Scalar lower, Scalar upper, bool training, bool self_is_result) -> Tensor
+  # self_is_result is always false here since double backward call is an out-of-place call, self is input itself
+  grad_output: rrelu_with_noise_backward(grad, self, noise, lower, upper, training, false)
+  self: zeros_like(grad)
+  result: rrelu_with_noise_backward(grad_output_t, self_p, noise, lower, upper, training, false)
+
+- name: reflection_pad1d_backward(Tensor grad_output, Tensor self, SymInt[2] padding) -> Tensor
+  grad_output: reflection_pad1d_symint(grad, padding)
+  self: zeros_like(self)
+  result: reflection_pad1d_backward_symint(grad_output_t, self_p, padding)
+
+- name: reflection_pad2d_backward(Tensor grad_output, Tensor self, SymInt[4] padding) -> Tensor
+  grad_output: reflection_pad2d_symint(grad, padding)
+  self: zeros_like(self)
+  result: reflection_pad2d_backward_symint(grad_output_t, self_p, padding)
+
+- name: reflection_pad3d_backward(Tensor grad_output, Tensor self, SymInt[6] padding) -> Tensor
+  grad_output: reflection_pad3d_symint(grad, padding)
+  self: zeros_like(self)
+  result: reflection_pad3d_backward_symint(grad_output_t, self_p, padding)
+
+- name: replication_pad1d_backward(Tensor grad_output, Tensor self, SymInt[2] padding) -> Tensor
+  grad_output: replication_pad1d_symint(grad, padding)
+  self: zeros_like(self)
+  result: replication_pad1d_backward_symint(grad_output_t, self_p, padding)
+
+- name: replication_pad2d_backward(Tensor grad_output, Tensor self, SymInt[4] padding) -> Tensor
+  grad_output: replication_pad2d_symint(grad, padding)
+  self: zeros_like(self)
+  result: replication_pad2d_backward_symint(grad_output_t, self_p, padding)
+
+- name: replication_pad3d_backward(Tensor grad_output, Tensor self, SymInt[6] padding) -> Tensor
+  grad_output: replication_pad3d_symint(grad, padding)
+  self: zeros_like(self)
+  result: replication_pad3d_backward_symint(grad_output_t, self_p, padding)
+
+- name: sparse_sampled_addmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor
+  self, mat1, mat2: "sparse_sampled_addmm_backward(grad,
+                                                   self,
+                                                   wrap_opt_if(mat1, grad_input_mask[2]),
+                                                   wrap_opt_if(mat2, grad_input_mask[1]),
+                                                   alpha, beta, grad_input_mask)"
+
+- name: _sparse_mm_reduce_impl(Tensor self, Tensor other, str reduce) -> (Tensor, Tensor)
+  output_differentiability: [True, False]
+  self, other: "grad.defined() ? _sparse_mm_reduce_impl_backward(self, grad, other, reduce, result1, grad_input_mask) :  std::tuple<Tensor, Tensor>()"
+
+- name: smooth_l1_loss_backward(Tensor grad_output, Tensor self, Tensor target, int reduction, float beta) -> Tensor
+  grad_output: smooth_l1_loss_backward(grad, self, target, reduction, beta)
+  self: smooth_l1_loss_double_backward(grad * grad_output, self, target, reduction, beta)
+  target: -smooth_l1_loss_double_backward(grad * grad_output, self, target, reduction, beta)
+  result: "  smooth_l1_loss_double_backward(self_t * grad_output_p, self_p, target_p, reduction, beta)
+           - smooth_l1_loss_double_backward(target_t * grad_output_p, self_p, target_p, reduction, beta)
+           + smooth_l1_loss_backward(grad_output_t, self_p, target_p, reduction, beta)
+          "
+
+- name: huber_loss_backward(Tensor grad_output, Tensor self, Tensor target, int reduction, float delta) -> Tensor
+  grad_output: huber_loss_double_backward_grad_output(grad, grad_output, self, target, reduction, delta)
+  self: huber_loss_double_backward(grad * grad_output, self, target, reduction, delta)
+  target: -huber_loss_double_backward(grad * grad_output, self, target, reduction, delta)
+
+- name: softplus_backward(Tensor grad_output, Tensor self, Scalar beta, Scalar threshold) -> Tensor
+  grad_output: softplus_backward(grad, self, beta, threshold)
+  self: softplus_double_backward(grad * grad_output, self, beta, threshold)
+  result: "softplus_backward(grad_output_t, self_p, beta, threshold)
+         + softplus_double_backward(self_t * grad_output_p, self_p, beta, threshold)"
+
+- name: _softmax_backward_data(Tensor grad_output, Tensor output, int dim, ScalarType input_dtype) -> Tensor
+  grad_output: _softmax_backward_data(grad.to(output.dtype()), output, dim, input_dtype)
+  output: softmax_double_backward(grad.to(output.dtype()), grad_output, dim, output).to(output.dtype())
+
+- name: soft_margin_loss_backward(Tensor grad_output, Tensor self, Tensor target, int reduction) -> Tensor
+  grad_output: soft_margin_loss_double_backward_grad_output(grad, grad_output, self, target, reduction)
+  self: soft_margin_loss_double_backward(grad * grad_output, self, target, reduction)
+
+- name: softshrink_backward(Tensor grad_output, Tensor self, Scalar lambd) -> Tensor
+  grad_output: softshrink_backward(grad, self, lambd)
+  self: zeros_like(grad)
+  result: at::where((self_p > lambd).logical_or(self_p < -lambd), grad_output_t, at::zeros({}, result.options()).expand_as(result))
+
+- name: threshold_backward(Tensor grad_output, Tensor self, Scalar threshold) -> Tensor
+  grad_output: threshold_backward(grad, self, threshold)
+  self: zeros_like(grad)
+  result: zeros_like(self_t) + threshold_backward(grad_output_t, self_p, threshold)
+
+- name: upsample_linear1d_backward(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, bool align_corners, float? scales=None) -> Tensor
+  grad_output: upsample_linear1d_symint(grad, output_size, align_corners, scales)
+  result: auto_linear
+
+- name: upsample_bilinear2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  grad_output: upsample_bilinear2d_symint(grad, output_size, align_corners, scales_h, scales_w)
+  result: auto_linear
+
+- name: _upsample_bilinear2d_aa_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  grad_output: _upsample_bilinear2d_aa_symint(grad, output_size, align_corners, scales_h, scales_w)
+  result: auto_linear
+
+- name: upsample_bicubic2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  grad_output: upsample_bicubic2d_symint(grad, output_size, align_corners, scales_h, scales_w)
+  result: auto_linear
+
+- name: _upsample_bicubic2d_aa_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor
+  grad_output: _upsample_bicubic2d_aa_symint(grad, output_size, align_corners, scales_h, scales_w)
+  result: auto_linear
+
+- name: upsample_trilinear3d_backward(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  grad_output: upsample_trilinear3d_symint(grad, output_size, align_corners, scales_d, scales_h, scales_w)
+  result: auto_linear
+
+- name: upsample_nearest1d_backward(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, float? scales=None) -> Tensor
+  grad_output: upsample_nearest1d_symint(grad, output_size, scales)
+  result: auto_linear
+
+- name: _upsample_nearest_exact1d_backward(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, float? scales=None) -> Tensor
+  grad_output: _upsample_nearest_exact1d_symint(grad, output_size, scales)
+  result: auto_linear
+
+- name: upsample_nearest2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, float? scales_h=None, float? scales_w=None) -> Tensor
+  grad_output: upsample_nearest2d_symint(grad, output_size, scales_h, scales_w)
+  result: auto_linear
+
+- name: _upsample_nearest_exact2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, float? scales_h=None, float? scales_w=None) -> Tensor
+  grad_output: _upsample_nearest_exact2d_symint(grad, output_size, scales_h, scales_w)
+  result: auto_linear
+
+- name: upsample_nearest3d_backward(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  grad_output: upsample_nearest3d_symint(grad, output_size, scales_d, scales_h, scales_w)
+  result: auto_linear
+
+- name: _upsample_nearest_exact3d_backward(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor
+  grad_output: _upsample_nearest_exact3d_symint(grad, output_size, scales_d, scales_h, scales_w)
+  result: auto_linear
+
+- name: sigmoid_backward(Tensor grad_output, Tensor output) -> Tensor
+  grad_output: sigmoid_backward(grad, output.conj())
+  output: grad.conj() * grad_output * (-2 * output.conj() + 1)
+  result: sigmoid_backward(grad_output_t, output_p) + output_t.conj() * grad_output_p * (-2 * output_p.conj() + 1)
+
+- name: tanh_backward(Tensor grad_output, Tensor output) -> Tensor
+  grad_output: tanh_backward(grad, output.conj())
+  output: grad.conj() * (-2 * output.conj() * grad_output)
+  result: tanh_backward(grad_output_t, output_p) + output_t.conj() * (-2 * output_p.conj() * grad_output_p)
+
+# cudnn
+- name: _cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank, bool deterministic, bool zero_infinity) -> (Tensor, Tensor)
+  log_probs: _cudnn_ctc_loss_backward(grad, result0, result1, zero_infinity)
+
+- name: _cudnn_ctc_loss.Tensor(Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, int blank, bool deterministic, bool zero_infinity) -> (Tensor, Tensor)
+  log_probs: _cudnn_ctc_loss_backward(grad, result0, result1, zero_infinity)
+
+- name: cudnn_convolution_transpose(Tensor self, Tensor weight, SymInt[] padding, SymInt[] output_padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic, bool allow_tf32) -> Tensor
+  self, weight: "_cudnn_convolution_backward(self, grad, weight, padding, output_padding, stride, dilation, true, groups, {grad_input_mask[0], grad_input_mask[1]})"
+
+- name: _mps_convolution_transpose(Tensor self, Tensor weight, SymInt[] padding, SymInt[] output_padding, SymInt[] stride, SymInt[] dilation, SymInt groups) -> Tensor
+  self, weight: "grad.defined() ? mps_convolution_transpose_backward_symint(self, grad, weight, padding, output_padding, stride, dilation, groups, grad_input_mask) : std::tuple<Tensor, Tensor>()"
+
+- name: cudnn_convolution(Tensor self, Tensor weight, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic, bool allow_tf32) -> Tensor
+  self, weight: "_cudnn_convolution_backward(self, grad, weight, padding, std::vector<c10::SymInt>(padding.size(), 0), stride, dilation, false, groups, {grad_input_mask[0], grad_input_mask[1]})"
+
+- name: cudnn_grid_sampler(Tensor self, Tensor grid) -> Tensor output
+  self, grid: "grad.defined() ? cudnn_grid_sampler_backward(self, grid, grad) : std::tuple<Tensor, Tensor>()"
+
+- name: cudnn_affine_grid_generator(Tensor theta, int N, int C, int H, int W) -> Tensor grid
+  theta: cudnn_affine_grid_generator_backward(grad, N, C, H, W)
+
+# NB: Why is the backwards here so complicated?  CuDNN cannot be used to compute
+# backward in evaluation mode, because the math for backward in evaluation mode
+# is different (since the forward math is different), and CuDNN does not support
+# it.  And in any case, you shouldn't be using this bn in evaluation mode,
+# because it should be merged into the previous convolution (left for future
+# work.)
+# NB2: The quotes around the gradient are needed to appease YAML parsing rules.
+- name: cudnn_batch_norm(Tensor input, Tensor weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float exponential_average_factor, float epsilon) -> (Tensor, Tensor, Tensor, Tensor)
+  input, weight, bias: "grad.defined() ? (training ? cudnn_batch_norm_backward(input, grad.contiguous(input.suggest_memory_format()), weight, running_mean, running_var, result1, result2, epsilon, retain_variables ? result3.clone() : result3) : native_batch_norm_backward(grad, input, weight, running_mean, running_var, result1, result2, training, epsilon, grad_input_mask)) : std::tuple<Tensor, Tensor, Tensor>()"
+  result0: batch_norm_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, running_mean, running_var, result1, result2, training, epsilon)
+
+# HACK: save_mean and save_var are going to be passed in as
+# requires_grad variables (even though we'll never backprop through
+# them) so we need to prevent the unpacking from triggering an error.
+- name: cudnn_batch_norm_backward(Tensor input, Tensor grad_output, Tensor weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_var, float epsilon, Tensor reserveSpace) -> (Tensor, Tensor, Tensor)
+  save_mean: not_implemented("cudnn_batch_norm_backward save_mean")
+  save_var: not_implemented("cudnn_batch_norm_backward save_var")
+  reserveSpace: not_implemented("cudnn_batch_norm_backward reserveSpace")
+  input, weight, grad_output: batchnorm_double_backward(input, weight, grads[0], grads[1], grads[2], grad_output, running_mean, running_var, true, epsilon, save_mean, save_var, grad_input_mask)
+
+# nnpack
+
+- name: _nnpack_spatial_convolution(Tensor input, Tensor weight, Tensor? bias, SymInt[2] padding, SymInt[2] stride=1) -> Tensor
+  # NNPACK does not support strided convolutions in the backwards path, which is the reason why we are using the closest available function that does here.
+  input, weight, bias: "grad.defined() ? convolution_backward_symint(grad, input, weight, bias->sym_sizes(), stride, padding, std::vector<c10::SymInt>(padding.size(), 1), false, std::vector<c10::SymInt>(padding.size(), 0), 1, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+#LSTM MPS
+- name: _lstm_mps(Tensor input, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first) -> (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor)
+  output_differentiability: [True, True, True, False, False, False]
+  input, hx, params: "lstm_mps_backward(grads[0], grads[1], grads[2], result3, result4, input, result5, hx, params, has_biases, num_layers, dropout, train, bidirectional, batch_first)"
+
+- name: lstm_mps_backward(Tensor? grad_y, Tensor? grad_hy, Tensor? grad_cy, Tensor z_state, Tensor cell_state_fwd, Tensor input, Tensor layersOutputs, Tensor[] hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first) -> (Tensor, Tensor[], Tensor[])
+
+
+
+# Only frst three of _cudnn_rnn outputs can have gradients.
+# _cudnn_rnn outputs: (output, hy, cy, reserve, weight_buf)
+- name: _cudnn_rnn(Tensor input, Tensor[] weight, int weight_stride0, Tensor? weight_buf, Tensor hx, Tensor? cx, int mode, SymInt hidden_size, SymInt proj_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, SymInt[] batch_sizes, Tensor? dropout_state) -> (Tensor, Tensor, Tensor, Tensor, Tensor)
+  dropout_state: non_differentiable
+  output_differentiability: [True, True, True, False, False]
+  input, hx, cx, weight: "_cudnn_rnn_backward_symint(input, weight, weight_stride0, result4, hx, cx, result0, grads[0], grads[1], grads[2], mode, hidden_size, proj_size, num_layers, batch_first, dropout, train, bidirectional, batch_sizes, dropout_state, retain_variables ? result3.clone() : result3, grad_input_mask)"
+
+- name: _cudnn_rnn_backward(Tensor input, Tensor[] weight, int weight_stride0, Tensor weight_buf, Tensor hx, Tensor? cx, Tensor output, Tensor? grad_output, Tensor? grad_hy, Tensor? grad_cy, int mode, SymInt hidden_size, SymInt proj_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, SymInt[] batch_sizes, Tensor? dropout_state, Tensor reserve, bool[4] output_mask) -> (Tensor, Tensor, Tensor, Tensor[])
+  dropout_state: non_differentiable
+  input: not_implemented("_cudnn_rnn_backward", kCudnnDoubleBackwardMsg)
+  weight: not_implemented_list("_cudnn_rnn_backward", kCudnnDoubleBackwardMsg)
+  hx: not_implemented("_cudnn_rnn_backward", kCudnnDoubleBackwardMsg)
+  cx: not_implemented("_cudnn_rnn_backward", kCudnnDoubleBackwardMsg)
+  output: not_implemented("_cudnn_rnn_backward", kCudnnDoubleBackwardMsg)
+  grad_output: not_implemented("_cudnn_rnn_backward", kCudnnDoubleBackwardMsg)
+  grad_hy: not_implemented("_cudnn_rnn_backward", kCudnnDoubleBackwardMsg)
+  grad_cy: not_implemented("_cudnn_rnn_backward", kCudnnDoubleBackwardMsg)
+
+# miopen
+
+- name: miopen_convolution_transpose(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] output_padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad, self, weight, bias->sym_sizes(), stride, padding, dilation, true, output_padding, groups, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: miopen_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad, self, weight, bias->sym_sizes(), stride, padding, dilation, false, std::vector<c10::SymInt>(padding.size(), 0), groups, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: miopen_depthwise_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups, bool benchmark, bool deterministic) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad, self, weight, bias->sym_sizes(), stride, padding, dilation, false, std::vector<c10::SymInt>(padding.size(), 0), groups, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: miopen_batch_norm(Tensor input, Tensor weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float exponential_average_factor, float epsilon) -> (Tensor, Tensor, Tensor)
+  input, weight, bias: "grad.defined() ? (training ? miopen_batch_norm_backward(input, grad.contiguous(input.suggest_memory_format()), weight, running_mean, running_var, result1, result2, epsilon) : native_batch_norm_backward(grad, input, weight, running_mean, running_var, result1, result2, training, epsilon, grad_input_mask)) : std::tuple<Tensor, Tensor, Tensor>()"
+  result0: batch_norm_jvp(input_p, input_t, weight_p, weight_t, bias_p, bias_t, running_mean, running_var, result1, result2, training, epsilon)
+
+- name: miopen_batch_norm_backward(Tensor input, Tensor grad_output, Tensor weight, Tensor? running_mean, Tensor? running_var, Tensor? save_mean, Tensor? save_var, float epsilon) -> (Tensor, Tensor, Tensor)
+  save_mean: not_implemented("miopen_batch_norm_backward save_mean")
+  save_var: not_implemented("miopen_batch_norm_backward save_var")
+  input, weight, grad_output: batchnorm_double_backward(input, weight, grads[0], grads[1], grads[2], grad_output, running_mean, running_var, true, epsilon, save_mean, save_var, grad_input_mask)
+
+- name: miopen_rnn(Tensor input, Tensor[] weight, int weight_stride0, Tensor hx, Tensor? cx, int mode, int hidden_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, int[] batch_sizes, Tensor? dropout_state) -> (Tensor, Tensor, Tensor, Tensor, Tensor)
+  dropout_state: non_differentiable
+  output_differentiability: [True, True, True, False, False]
+  input, hx, cx, weight: "miopen_rnn_backward(input, weight, weight_stride0, result4, hx, cx, result0, grads[0], grads[1], grads[2], mode, hidden_size, num_layers, batch_first, dropout, train, bidirectional, batch_sizes, dropout_state, retain_variables ? result3.clone() : result3, grad_input_mask)"
+
+- name: miopen_rnn_backward(Tensor input, Tensor[] weight, int weight_stride0, Tensor weight_buf, Tensor hx, Tensor? cx, Tensor output, Tensor? grad_output, Tensor? grad_hy, Tensor? grad_cy, int mode, int hidden_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, int[] batch_sizes, Tensor? dropout_state, Tensor reserve, bool[4] output_mask) -> (Tensor, Tensor, Tensor, Tensor[])
+  dropout_state: non_differentiable
+
+- name: mkldnn_rnn_layer(Tensor input, Tensor weight0, Tensor weight1, Tensor weight2, Tensor weight3, Tensor hx_, Tensor cx_, bool reverse, int[] batch_sizes, int mode, int hidden_size, int num_layers, bool has_biases, bool bidirectional, bool batch_first, bool train) -> (Tensor, Tensor, Tensor, Tensor)
+  output_differentiability: [True, True, True, False]
+  input, weight0, weight1, weight2, weight3, hx_, cx_: "GradMode::is_enabled() ? mkldnn_rnn_layer_differentiable_backward(input, weight0, weight1, weight2, weight3, hx_, cx_, result0, result1, result2, grads[0], grads[1], grads[2], reverse, mode, hidden_size, num_layers, has_biases, train, bidirectional, batch_sizes, batch_first, result3) : mkldnn_rnn_layer_backward(input, weight0, weight1, weight2, weight3, hx_, cx_, result0, result1, result2, grads[0], grads[1], grads[2], reverse, mode, hidden_size, num_layers, has_biases, train, bidirectional, batch_sizes, batch_first, result3)"
+
+- name: mkldnn_rnn_layer_backward(Tensor input, Tensor weight1, Tensor weight2, Tensor weight3, Tensor weight4, Tensor hx_, Tensor cx_tmp, Tensor output, Tensor hy_, Tensor cy_, Tensor? grad_output, Tensor? grad_hy, Tensor? grad_cy, bool reverse, int mode, int hidden_size, int num_layers, bool has_biases, bool train, bool bidirectional, int[] batch_sizes, bool batch_first, Tensor workspace) -> (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor)
+
+# mkldnn
+- name: mkldnn_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] stride, SymInt[] dilation, SymInt groups) -> Tensor
+  self, weight, bias: "grad.defined() ? convolution_backward_symint(grad, self, weight, bias->sym_sizes(), stride, padding, dilation, /*transposed=*/ false, /*output_padding=*/ std::vector<c10::SymInt>(padding.size(), 0), groups, grad_input_mask) : std::tuple<Tensor, Tensor, Tensor>()"
+
+- name: mkldnn_linear(Tensor self, Tensor weight, Tensor? bias=None) -> Tensor
+  self, weight, bias: mkldnn_linear_backward(self, grad, weight, grad_input_mask)
+
+- name: mkldnn_max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> Tensor
+  self: mkldnn_max_pool2d_backward(grad, result, self, kernel_size, stride, padding, dilation, ceil_mode)
+
+- name: mkldnn_max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, int[3] dilation=1, bool ceil_mode=False) -> Tensor
+  self: mkldnn_max_pool3d_backward(grad, result, self, kernel_size, stride, padding, dilation, ceil_mode)
+
+- name: mkldnn_adaptive_avg_pool2d(Tensor self, int[2] output_size) -> Tensor
+  self: mkldnn_adaptive_avg_pool2d_backward(grad, self)
+
+- name: _mkldnn_reshape(Tensor self, int[] shape) -> Tensor
+  self: grad.reshape_symint(self.sym_sizes())
+
+# NestedTensor
+- name: _nested_tensor_from_tensor_list(Tensor[] list, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  list: "grad.defined()? at::unbind(grad) : std::vector<Tensor>(list.size())"
+
+- name: _nested_tensor_from_mask(Tensor t, Tensor mask, bool mask_check=True) -> Tensor
+  t: grad.to_padded_tensor_symint(0, t.sym_sizes())
+  mask: non_differentiable
+
+- name: _nested_from_padded(Tensor padded, Tensor cpu_nested_shape_example, bool fuse_transform_0213=False) -> Tensor
+  padded: _nested_from_padded_backward(grad, padded, fuse_transform_0213)
+  cpu_nested_shape_example: non_differentiable
+
+- name: to_padded_tensor(Tensor self, float padding, SymInt[]? output_size=None) -> Tensor
+  self: "self.layout() == c10::kJagged ? at::_nested_from_padded_tensor_symint(grad, at::_nested_get_offsets(self), at::_nested_get_jagged_dummy(self), at::_nested_get_ragged_idx(self), at::_nested_get_min_seqlen(self).defined() ? std::optional<Tensor>(at::_nested_get_min_seqlen(self)) : ::std::nullopt, at::_nested_get_max_seqlen(self).defined() ? std::optional<Tensor>(at::_nested_get_max_seqlen(self)) : ::std::nullopt, std::optional<c10::SymInt>(at::_nested_get_values(self).sym_size(0))) : at::_nested_from_padded(grad, self._nested_tensor_size())"
+  padding: non_differentiable
+
+- name: _nested_from_padded_tensor(Tensor padded, Tensor offsets, Tensor dummy, int ragged_idx=1, Tensor? min_seqlen=None, Tensor? max_seqlen=None, SymInt? sum_S=None) -> Tensor
+  padded: grad.to_padded_tensor_symint(0.0, at::OptionalArrayRef<c10::SymInt>(padded.sym_sizes()))
+  offsets: non_differentiable
+  dummy: non_differentiable
+
+- name:  _nested_view_from_buffer(Tensor(a) self, Tensor nested_size, Tensor nested_strides, Tensor offsets) -> Tensor(a)
+  self: grad.values()
+  nested_size: non_differentiable
+  nested_strides: non_differentiable
+
+- name: _nested_view_from_jagged(Tensor(a) self, Tensor offsets, Tensor dummy, Tensor? lengths=None, int ragged_idx=1, Tensor? min_seqlen=None, Tensor? max_seqlen=None) -> Tensor(a)
+  self: grad.values()
+  offsets: non_differentiable
+  lengths: non_differentiable
+  dummy: non_differentiable
+  min_seqlen: non_differentiable
+  max_seqlen: non_differentiable
+
+- name: _nested_get_values(Tensor(a) self) -> Tensor(a)
+  self: "_nested_view_from_jagged(grad, at::_nested_get_offsets(self), at::_nested_get_jagged_dummy(self), at::_nested_get_lengths(self), at::_nested_get_ragged_idx(self), at::_nested_get_min_seqlen(self).defined() ? std::optional<Tensor>(at::_nested_get_min_seqlen(self)) : ::std::nullopt, at::_nested_get_max_seqlen(self).defined() ? std::optional<Tensor>(at::_nested_get_max_seqlen(self)) : ::std::nullopt)"
+
+# Transformer
+- name:  _safe_softmax(Tensor self, int dim, ScalarType? dtype=None) -> Tensor
+  self: _softmax_backward_data(grad, result, dim, self.scalar_type())
+  result: result * (self_t - safe_logsumexp_jvp(self_p, self_t, {dim}, true))
+
+- name: _scaled_dot_product_efficient_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_bias, bool compute_log_sumexp, float dropout_p=0.0, bool is_causal=False, *, float? scale=None) -> (Tensor output, Tensor log_sumexp, Tensor philox_seed, Tensor philox_offset)
+  output_differentiability: [True, False, False, False]
+  query, key, value, attn_bias: _scaled_dot_product_efficient_attention_backward(grad, query, key, value, attn_bias, output, log_sumexp, philox_seed, philox_offset, dropout_p, grad_input_mask, is_causal, scale)
+
+- name: _scaled_dot_product_flash_attention(Tensor query, Tensor key, Tensor value, float dropout_p=0.0, bool is_causal=False, bool return_debug_mask=False, *, float? scale=None) -> (Tensor output, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, Tensor rng_state, Tensor unused, Tensor debug_attn_mask)
+  output_differentiability: [True, False, False, False, False, False, False, False, False]
+  query, key, value: _scaled_dot_product_flash_attention_backward_symint(grad, query, key, value, output, logsumexp, cum_seq_q, cum_seq_k, max_q, max_k, dropout_p, is_causal, rng_state, unused, scale)
+
+- name: _scaled_dot_product_flash_attention_for_cpu(Tensor query, Tensor key, Tensor value, float dropout_p=0.0, bool is_causal=False, *, Tensor? attn_mask=None, float? scale=None) -> (Tensor output, Tensor logsumexp)
+  output_differentiability: [True, False]
+  query, key, value: _scaled_dot_product_flash_attention_for_cpu_backward(grad, query, key, value, output, logsumexp, dropout_p, is_causal, attn_mask, scale)
+
+- name: _flash_attention_forward(Tensor query, Tensor key, Tensor value, Tensor? cum_seq_q, Tensor? cum_seq_k, SymInt max_q, SymInt max_k, float dropout_p, bool is_causal, bool return_debug_mask, *, float? scale=None, SymInt? window_size_left=None, SymInt? window_size_right=None, Tensor? seqused_k=None, Tensor? alibi_slopes=None) -> (Tensor output, Tensor softmax_logsumexp, Tensor rng_state, Tensor unused, Tensor debug_attn_mask)
+  output_differentiability: [True, False, False, False, False]
+  query, key, value: _flash_attention_backward_symint(grad, query, key, value, output, softmax_logsumexp, cum_seq_q, cum_seq_k, max_q, max_k, dropout_p, is_causal, rng_state, unused, scale, window_size_left, window_size_right)
+
+- name: _efficient_attention_forward(Tensor query, Tensor key, Tensor value, Tensor? bias, Tensor? cu_seqlens_q, Tensor? cu_seqlens_k, SymInt? max_seqlen_q, SymInt? max_seqlen_k, float dropout_p, int custom_mask_type, bool compute_log_sumexp=False, *, float? scale=None, Tensor? seqlen_k=None, int? window_size=None) -> (Tensor output, Tensor logsumexp, Tensor philox_seed, Tensor philox_offset, SymInt max_seqlen_batch_q, SymInt max_seqlen_batch_k)
+  output_differentiability: [True, False, False, False, False, False]
+  query, key, value, bias: _efficient_attention_backward_symint(grad, query, key, value, bias, output, cu_seqlens_q, cu_seqlens_k, max_seqlen_batch_q, max_seqlen_batch_k, logsumexp, dropout_p, philox_seed, philox_offset, custom_mask_type, bias.requires_grad(), scale)
+
+- name: _cudnn_attention_forward(Tensor query, Tensor key, Tensor value, Tensor? attn_bias, Tensor? cum_seq_q, Tensor? cum_seq_k, SymInt max_q, SymInt max_k, bool compute_log_sumexp, float dropout_p=0.0, bool is_causal=False, bool return_debug_mask=False, *, float? scale=None) -> (Tensor output, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, Tensor philox_seed, Tensor philox_offset, Tensor debug_attn_mask)
+  output_differentiability: [True, False, False, False, False, False, False, False, False]
+  query, key, value: _cudnn_attention_backward_symint(grad, query, key, value, output, logsumexp, philox_seed, philox_offset, attn_bias, cum_seq_q, cum_seq_k, max_q, max_k, dropout_p, is_causal, scale)
+
+- name: _scaled_dot_product_cudnn_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_bias, bool compute_log_sumexp, float dropout_p=0.0, bool is_causal=False, bool return_debug_mask=False, *, float? scale=None) -> (Tensor output, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, Tensor philox_seed, Tensor philox_offset, Tensor debug_attn_mask)
+  output_differentiability: [True, False, False, False, False, False, False, False, False]
+  query, key, value: _scaled_dot_product_cudnn_attention_backward_symint(grad, query, key, value, output, logsumexp, philox_seed, philox_offset, attn_bias, cum_seq_q, cum_seq_k, max_q, max_k, dropout_p, is_causal, scale)
+
+- name: _scaled_dot_product_fused_attention_overrideable(Tensor query, Tensor key, Tensor value, Tensor? attn_bias=None, float dropout_p=0.0, bool is_causal=False, bool return_debug_mask=False, *, float? scale=None) -> (Tensor output, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, SymInt max_q, SymInt max_k, Tensor philox_seed, Tensor philox_offset, Tensor debug_attn_mask)
+  output_differentiability: [True, False, False, False, False, False, False, False, False]
+  query, key, value, attn_bias: _scaled_dot_product_fused_attention_overrideable_backward_symint(grad, query, key, value, attn_bias, grad_input_mask, output, logsumexp, cum_seq_q, cum_seq_k, max_q, max_k, dropout_p, is_causal, philox_seed, philox_offset, scale)
+
+# fft
+- name: _fft_r2c(Tensor self, int[] dim, int normalization, bool onesided) -> Tensor
+  self: fft_r2c_backward(grad, dim, normalization, onesided, self.sym_size(dim.back()))
+  result: auto_linear
+
+- name: _fft_c2r(Tensor self, int[] dim, int normalization, SymInt last_dim_size) -> Tensor
+  self: fft_c2r_backward(grad, dim, normalization)
+  result: auto_linear
+
+- name: _fft_c2c(Tensor self, SymInt[] dim, int normalization, bool forward) -> Tensor
+  self: _fft_c2c_symint(grad, dim, normalization, !forward)
+  result: auto_linear
+
+- name: unbind.int(Tensor(a -> *) self, int dim=0) -> Tensor(a)[]
+  dispatch:
+    Default:
+      self: unbind_backward(grads, dim)
+      result: auto_linear
+    AutogradNestedTensor:
+      self: "self.layout() == c10::kJagged ? unbind_backward_nested_jagged(grads, self, dim) : unbind_backward_nested(grads, at::native::get_nested_tensor_impl(self)->get_nested_sizes(), dim, self.options())"
+      result: auto_linear
+
+- name: stack(Tensor[] tensors, int dim=0) -> Tensor
+  tensors: stack_tensors_backward(grad, dim, to_args_scalartypes(tensors))
+  result: stack_jvp(tensors, dim)
+
+# fused RNN kernels
+
+# Only frst two of _thnn_fused_lstm_cell outputs can have gradients.
+# _thnn_fused_lstm_cell outputs: (hy, cy, workspace)
+- name: _thnn_fused_lstm_cell(Tensor input_gates, Tensor hidden_gates, Tensor cx, Tensor? input_bias=None, Tensor? hidden_bias=None) -> (Tensor, Tensor, Tensor)
+  output_differentiability: [True, True, False]
+  input_gates, hidden_gates, cx, input_bias, hidden_bias: "GradMode::is_enabled() ? _thnn_differentiable_lstm_cell_backward(grads[0], grads[1], input_gates, hidden_gates, input_bias, hidden_bias, cx, result1) : _thnn_fused_lstm_cell_backward(grads[0], grads[1], cx, result1, result2, input_bias.defined())"
+
+- name: _thnn_fused_gru_cell(Tensor input_gates, Tensor hidden_gates, Tensor hx, Tensor? input_bias=None, Tensor? hidden_bias=None) -> (Tensor, Tensor)
+  input_gates, hidden_gates, hx, input_bias, hidden_bias: "grad.defined() ? (GradMode::is_enabled() ? _thnn_differentiable_gru_cell_backward(grad, input_gates, hidden_gates, hx, input_bias, hidden_bias) : _thnn_fused_gru_cell_backward(grad, result1, input_bias.defined())) : std::tuple<Tensor, Tensor, Tensor, Tensor, Tensor>()"
+
+# PackedSequence helpers
+- name: _pack_padded_sequence(Tensor input, Tensor lengths, bool batch_first) -> (Tensor, Tensor)
+  input: _pack_padded_sequence_backward_symint(grad, input.sym_sizes(), result1, batch_first)
+
+# TH wrappers
+- name: eq.Scalar(Tensor self, Scalar other) -> Tensor
+  output_differentiability: [False]
+
+- name: eq.Tensor(Tensor self, Tensor other) -> Tensor
+  output_differentiability: [False]
+
+- name: ge.Scalar(Tensor self, Scalar other) -> Tensor
+  output_differentiability: [False]
+
+- name: ge.Tensor(Tensor self, Tensor other) -> Tensor
+  output_differentiability: [False]
+
+- name: gt.Scalar(Tensor self, Scalar other) -> Tensor
+  output_differentiability: [False]
+
+- name: gt.Tensor(Tensor self, Tensor other) -> Tensor
+  output_differentiability: [False]
+
+- name: le.Scalar(Tensor self, Scalar other) -> Tensor
+  output_differentiability: [False]
+
+- name: le.Tensor(Tensor self, Tensor other) -> Tensor
+  output_differentiability: [False]
+
+- name: lt.Scalar(Tensor self, Scalar other) -> Tensor
+  output_differentiability: [False]
+
+- name: lt.Tensor(Tensor self, Tensor other) -> Tensor
+  output_differentiability: [False]
+
+- name: ne.Scalar(Tensor self, Scalar other) -> Tensor
+  output_differentiability: [False]
+
+- name: ne.Tensor(Tensor self, Tensor other) -> Tensor
+  output_differentiability: [False]
+
+- name: multinomial(Tensor self, SymInt num_samples, bool replacement=False, *, Generator? generator=None) -> Tensor
+  output_differentiability: [False]
+
+- name: nonzero(Tensor self) -> Tensor
+  output_differentiability: [False]
+
+- name: segment_reduce(Tensor data, str reduce, *, Tensor? lengths=None, Tensor? indices=None, Tensor? offsets=None, int axis=0, bool unsafe=False, Scalar? initial=None) -> Tensor
+  data: _segment_reduce_backward(grad, result, data, reduce, lengths, offsets, axis, initial)
+
+- name: _pin_memory(Tensor self, Device? device=None) -> Tensor
+  self: grad
+
+- name: _new_zeros_with_same_feature_meta(Tensor self, Tensor other, *, int self_num_batch_dims=0) -> Tensor
+  self: non_differentiable
+  other: non_differentiable
+  output_differentiability: [False]
+
+- name: _test_warn_in_autograd(Tensor self) -> Tensor
+  self: warn_backwards(grad)
+
+- name: _test_autograd_multiple_dispatch.fullcoverage(Tensor self) -> Tensor
+  dispatch:
+    Default:
+      self: grad.expand_symint(self.sym_sizes()) + 1
+      result: auto_linear
+    AutogradNestedTensor:
+      self: grad.mul(grad)
+    AutogradCUDA:
+      self: grad.expand_symint(self.sym_sizes()) * 2
+
+- name: _test_autograd_multiple_dispatch.ntonly(Tensor self, bool b) -> Tensor
+  dispatch:
+    AutogradNestedTensor:
+      self: grad.mul(grad).add(grad)
+
+- name: _test_autograd_multiple_dispatch_view(Tensor(a) self) -> Tensor(a)
+  dispatch:
+    Default:
+      self: grad.reshape_as(self)
+    AutogradCUDA:
+      self: grad.reshape_as(self) + 1
+
+- name: _efficientzerotensor(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
+  output_differentiability: [False]
+
+- name: scatter_reduce.two(Tensor self, int dim, Tensor index, Tensor src, str reduce, *, bool include_self=True) -> Tensor
+  self, src: scatter_reduce_backward(grad, self, dim, index, src, reduce, include_self, result)
+  index: non_differentiable
+  result: scatter_reduce_jvp(self_p, self_t, dim, index, src_p, src_t, reduce, include_self, result)
+
+- name: special_airy_ai(Tensor x) -> Tensor
+  x: non_differentiable
+
+- name: special_bessel_j0(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: special_bessel_j1(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: special_bessel_y0(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: special_bessel_y1(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: special_chebyshev_polynomial_t(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_chebyshev_polynomial_t.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_chebyshev_polynomial_t.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_chebyshev_polynomial_u(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_chebyshev_polynomial_u.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_chebyshev_polynomial_u.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_chebyshev_polynomial_v(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_chebyshev_polynomial_v.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_chebyshev_polynomial_v.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_chebyshev_polynomial_w(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_chebyshev_polynomial_w.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_chebyshev_polynomial_w.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_hermite_polynomial_h(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_hermite_polynomial_h.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_hermite_polynomial_h.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_hermite_polynomial_he(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_hermite_polynomial_he.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_hermite_polynomial_he.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_laguerre_polynomial_l(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_laguerre_polynomial_l.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_laguerre_polynomial_l.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_legendre_polynomial_p(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_legendre_polynomial_p.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_legendre_polynomial_p.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_modified_bessel_i0(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: special_modified_bessel_i1(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: special_modified_bessel_k0(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: special_modified_bessel_k1(Tensor self) -> Tensor
+  self: non_differentiable
+
+- name: special_scaled_modified_bessel_k0(Tensor x) -> Tensor
+  x: non_differentiable
+
+- name: special_scaled_modified_bessel_k1(Tensor x) -> Tensor
+  x: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_t(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_t.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_t.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_u(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_u.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_u.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_v(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_v.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_v.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_w(Tensor x, Tensor n) -> Tensor
+  x: non_differentiable
+  n: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_w.x_scalar(Scalar x, Tensor n) -> Tensor
+  n: non_differentiable
+
+- name: special_shifted_chebyshev_polynomial_w.n_scalar(Tensor x, Scalar n) -> Tensor
+  x: non_differentiable
+
+- name: special_spherical_bessel_j0(Tensor x) -> Tensor
+  x: non_differentiable
+
+- name: _reshape_copy(Tensor self, SymInt[] size) -> Tensor
+  self: grad.reshape_symint(self.sym_sizes())
+  result: auto_linear
+
+# note(crcrpar): `torchgen/api/autograd` logic would unwantedly replace substrings of `self` and `other` of function names.
+- name: _foreach_div.List(Tensor[] self, Tensor[] other) -> Tensor[]
+  self: div_tensor_self_backward(grads[i], other[i], self[i].scalar_type())
+  other: div_tensor_other_backward(grads[i], self[i], other[i])
+  result: (self_t - other_t * result[i]) / other_p
+
+- name: _foreach_pow.List(Tensor[] self, Tensor[] exponent) -> Tensor[]
+  self: pow_backward_self(grads[i], self[i], exponent[i])
+  exponent: pow_backward_exponent(grads[i], self[i], exponent[i], result[i])
+  result: (pow_backward_self(self_t.conj(), self_p, exponent_p) + pow_backward_exponent(exponent_t.conj(), self_p, exponent_p, result[i])).conj()
+
+- name: _foreach_pow.ScalarList(Tensor[] self, Scalar[] exponent) -> Tensor[]
+  self: pow_backward(grads[i], self[i], exponent[i])
+  result: pow_backward(self_t.conj(), self_p, exponent[i]).conj()
+
+- name: _foreach_pow.ScalarAndTensor(Scalar self, Tensor[] exponent) -> Tensor[]
+  exponent: pow_backward_exponent(grads[i], self, exponent[i], result[i])
+
+# note(crcrpar): following definitions seem necessary because the reference native functions
+# of `maximum` and `minimum` don't have the overload def with Scalar as their second argument.
+- name: _foreach_minimum.Scalar(Tensor[] self, Scalar scalar) -> Tensor[]
+  self: at::where(self[i] == scalar, grads[i] / 2, grads[i]).masked_fill_(self[i] > scalar, 0)
+  result: scalar + at::where(self_p == scalar, at::scalar_tensor(0.5, result[i].options()), (self_p < scalar).to(result[i].scalar_type())) * (self_t - scalar)
+
+- name: _foreach_minimum.ScalarList(Tensor[] self, Scalar[] scalars) -> Tensor[]
+  self: at::where(self[i] == scalars[i], grads[i] / 2, grads[i]).masked_fill_(self[i] > scalars[i], 0)
+  result: scalars[i] + at::where(self_p == scalars[i], at::scalar_tensor(0.5, result[i].options()), (self_p < scalars[i]).to(result[i].scalar_type())) * (self_t - scalars[i])
+
+- name: _foreach_maximum.Scalar(Tensor[] self, Scalar scalar) -> Tensor[]
+  self: at::where(self[i] == scalar, grads[i] / 2, grads[i]).masked_fill_(self[i] < scalar, 0)
+  result: scalar + at::where(self_p == scalar, at::scalar_tensor(0.5, result[i].options()), (self_p > scalar).to(result[i].scalar_type())) * (self_t - scalar)
+
+- name: _foreach_maximum.ScalarList(Tensor[] self, Scalar[] scalars) -> Tensor[]
+  self: at::where(self[i] == scalars[i], grads[i] / 2, grads[i]).masked_fill_(self[i] < scalars[i], 0)
+  result: scalars[i] + at::where(self_p == scalars[i], at::scalar_tensor(0.5, result[i].options()), (self_p > scalars[i]).to(result[i].scalar_type())) * (self_t - scalars[i])
+
+# note(crcrpar): forward-mode AD is tricky for a simple string replace to handle:
+#   formula.replace("p", "ord") produces `norm_jvord(self_ord, self_t, ord, result)`
+- name: _foreach_norm.Scalar(Tensor[] self, Scalar ord=2, ScalarType? dtype=None) -> Tensor[]
+  self: norm_backward(grads[i], self[i], ord, result[i])
+  result: norm_jvp(self_p, self_t, ord, result[i])
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_annotated_fn_args.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_annotated_fn_args.py
new file mode 100644
index 0000000000000000000000000000000000000000..2f61209fa6fd0041b732f1400e1162d2f124ad34
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_annotated_fn_args.py
@@ -0,0 +1,134 @@
+"""
+For procedural tests needed for __torch_function__, we use this function
+to export method names and signatures as needed by the tests in
+test/test_overrides.py.
+
+python -m tools.autograd.gen_annotated_fn_args \
+       aten/src/ATen/native/native_functions.yaml \
+       aten/src/ATen/native/tags.yaml \
+       $OUTPUT_DIR \
+       tools/autograd
+
+Where $OUTPUT_DIR is where you would like the files to be
+generated.  In the full build system, OUTPUT_DIR is
+torch/testing/_internal/generated
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+import textwrap
+from collections import defaultdict
+from typing import Any, TYPE_CHECKING
+
+import torchgen.api.python as python
+from torchgen.context import with_native_function
+from torchgen.gen import parse_native_yaml
+from torchgen.utils import FileManager
+
+from .gen_python_functions import (
+    is_py_fft_function,
+    is_py_linalg_function,
+    is_py_nn_function,
+    is_py_special_function,
+    is_py_torch_function,
+    is_py_variable_method,
+    should_generate_py_binding,
+)
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+    from torchgen.model import Argument, BaseOperatorName, NativeFunction
+
+
+def gen_annotated(
+    native_yaml_path: str, tags_yaml_path: str, out: str, autograd_dir: str
+) -> None:
+    native_functions = parse_native_yaml(
+        native_yaml_path, tags_yaml_path
+    ).native_functions
+    mappings = (
+        (is_py_torch_function, "torch._C._VariableFunctions"),
+        (is_py_nn_function, "torch._C._nn"),
+        (is_py_linalg_function, "torch._C._linalg"),
+        (is_py_special_function, "torch._C._special"),
+        (is_py_fft_function, "torch._C._fft"),
+        (is_py_variable_method, "torch.Tensor"),
+    )
+    annotated_args: list[str] = []
+    for pred, namespace in mappings:
+        groups: dict[BaseOperatorName, list[NativeFunction]] = defaultdict(list)
+        for f in native_functions:
+            if not should_generate_py_binding(f) or not pred(f):
+                continue
+            groups[f.func.name.name].append(f)
+        for group in groups.values():
+            for f in group:
+                annotated_args.append(f"{namespace}.{gen_annotated_args(f)}")
+
+    template_path = os.path.join(autograd_dir, "templates")
+    fm = FileManager(install_dir=out, template_dir=template_path, dry_run=False)
+    fm.write_with_template(
+        "annotated_fn_args.py",
+        "annotated_fn_args.py.in",
+        lambda: {
+            "annotated_args": textwrap.indent("\n".join(annotated_args), "    "),
+        },
+    )
+
+
+@with_native_function
+def gen_annotated_args(f: NativeFunction) -> str:
+    def _get_kwargs_func_exclusion_list() -> list[str]:
+        # functions that currently don't work with kwargs in test_overrides.py
+        return [
+            "diagonal",
+            "round_",
+            "round",
+            "scatter_",
+        ]
+
+    def _add_out_arg(
+        out_args: list[dict[str, Any]], args: Sequence[Argument], *, is_kwarg_only: bool
+    ) -> None:
+        for arg in args:
+            if arg.default is not None:
+                continue
+            out_arg: dict[str, Any] = {}
+            out_arg["is_kwarg_only"] = str(is_kwarg_only)
+            out_arg["name"] = arg.name
+            out_arg["simple_type"] = python.argument_type_str(
+                arg.type, simple_type=True
+            )
+            size_t = python.argument_type_size(arg.type)
+            if size_t:
+                out_arg["size"] = size_t
+            out_args.append(out_arg)
+
+    out_args: list[dict[str, Any]] = []
+    _add_out_arg(out_args, f.func.arguments.flat_positional, is_kwarg_only=False)
+    if f"{f.func.name.name}" not in _get_kwargs_func_exclusion_list():
+        _add_out_arg(out_args, f.func.arguments.flat_kwarg_only, is_kwarg_only=True)
+
+    return f"{f.func.name.name}: {repr(out_args)},"
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Generate annotated_fn_args script")
+    parser.add_argument(
+        "native_functions", metavar="NATIVE", help="path to native_functions.yaml"
+    )
+    parser.add_argument("tags", metavar="TAGS", help="path to tags.yaml")
+    parser.add_argument("out", metavar="OUT", help="path to output directory")
+    parser.add_argument(
+        "autograd", metavar="AUTOGRAD", help="path to template directory"
+    )
+    args = parser.parse_args()
+    gen_annotated(args.native_functions, args.tags, args.out, args.autograd)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_autograd.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_autograd.py
new file mode 100644
index 0000000000000000000000000000000000000000..d93d3f4cab4a6f37c0c81c548b4da3b6c5b9dc95
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_autograd.py
@@ -0,0 +1,147 @@
+"""
+To run this file by hand from the root of the PyTorch
+repository, run:
+
+python -m tools.autograd.gen_autograd \
+       aten/src/ATen/native/native_functions.yaml \
+       aten/src/ATen/native/tags.yaml \
+       $OUTPUT_DIR \
+       tools/autograd
+
+Where $OUTPUT_DIR is where you would like the files to be
+generated.  In the full build system, OUTPUT_DIR is
+torch/csrc/autograd/generated/
+"""
+
+# gen_autograd.py generates C++ autograd functions and Python bindings.
+#
+# It delegates to the following scripts:
+#
+#  gen_autograd_functions.py: generates subclasses of torch::autograd::Node
+#  gen_variable_type.py: generates VariableType.h which contains all tensor methods
+#  gen_python_functions.py: generates Python bindings to THPVariable
+#
+
+from __future__ import annotations
+
+import argparse
+import os
+
+from torchgen.api import cpp
+from torchgen.api.autograd import (
+    match_differentiability_info,
+    NativeFunctionWithDifferentiabilityInfo,
+)
+from torchgen.gen import parse_native_yaml
+from torchgen.selective_build.selector import SelectiveBuilder
+
+from . import gen_python_functions
+from .gen_autograd_functions import (
+    gen_autograd_functions_lib,
+    gen_autograd_functions_python,
+)
+from .gen_inplace_or_view_type import gen_inplace_or_view_type
+from .gen_trace_type import gen_trace_type
+from .gen_variable_factories import gen_variable_factories
+from .gen_variable_type import gen_variable_type
+from .gen_view_funcs import gen_view_funcs
+from .load_derivatives import load_derivatives
+
+
+def gen_autograd(
+    native_functions_path: str,
+    tags_path: str,
+    out: str,
+    autograd_dir: str,
+    operator_selector: SelectiveBuilder,
+    disable_autograd: bool = False,
+) -> None:
+    # Parse and load derivatives.yaml
+    differentiability_infos, used_dispatch_keys = load_derivatives(
+        os.path.join(autograd_dir, "derivatives.yaml"), native_functions_path, tags_path
+    )
+
+    template_path = os.path.join(autograd_dir, "templates")
+
+    native_funcs = parse_native_yaml(native_functions_path, tags_path).native_functions
+    fns = sorted(
+        filter(
+            operator_selector.is_native_function_selected_for_training, native_funcs
+        ),
+        key=lambda f: cpp.name(f.func),
+    )
+    fns_with_diff_infos: list[NativeFunctionWithDifferentiabilityInfo] = (
+        match_differentiability_info(fns, differentiability_infos)
+    )
+
+    # Generate VariableType.h/cpp
+    if not disable_autograd:
+        gen_variable_type(
+            out,
+            native_functions_path,
+            tags_path,
+            fns_with_diff_infos,
+            template_path,
+            used_dispatch_keys,
+        )
+
+        gen_inplace_or_view_type(
+            out, native_functions_path, tags_path, fns_with_diff_infos, template_path
+        )
+
+        # operator filter not applied as tracing sources are excluded in selective build
+        gen_trace_type(out, native_funcs, template_path)
+    # Generate Functions.h/cpp
+    gen_autograd_functions_lib(out, differentiability_infos, template_path)
+
+    # Generate variable_factories.h
+    gen_variable_factories(out, native_functions_path, tags_path, template_path)
+
+    # Generate ViewFuncs.h/cpp
+    gen_view_funcs(out, fns_with_diff_infos, template_path)
+
+
+def gen_autograd_python(
+    native_functions_path: str,
+    tags_path: str,
+    out: str,
+    autograd_dir: str,
+) -> None:
+    differentiability_infos, _ = load_derivatives(
+        os.path.join(autograd_dir, "derivatives.yaml"), native_functions_path, tags_path
+    )
+
+    template_path = os.path.join(autograd_dir, "templates")
+
+    # Generate Functions.h/cpp
+    gen_autograd_functions_python(out, differentiability_infos, template_path)
+
+    # Generate Python bindings
+    deprecated_path = os.path.join(autograd_dir, "deprecated.yaml")
+    gen_python_functions.gen(
+        out, native_functions_path, tags_path, deprecated_path, template_path
+    )
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Generate autograd C++ files script")
+    parser.add_argument(
+        "native_functions", metavar="NATIVE", help="path to native_functions.yaml"
+    )
+    parser.add_argument("tags", metavar="NATIVE", help="path to tags.yaml")
+    parser.add_argument("out", metavar="OUT", help="path to output directory")
+    parser.add_argument(
+        "autograd", metavar="AUTOGRAD", help="path to autograd directory"
+    )
+    args = parser.parse_args()
+    gen_autograd(
+        args.native_functions,
+        args.tags,
+        args.out,
+        args.autograd,
+        SelectiveBuilder.get_nop_selector(),
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_autograd_functions.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_autograd_functions.py
new file mode 100644
index 0000000000000000000000000000000000000000..d32562374d5f6e85cad18f314fbbf2d3cf415985
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_autograd_functions.py
@@ -0,0 +1,1076 @@
+# Generates C++ autograd functions for the derivatives of ATen operations
+#
+# This writes two files:
+#  Functions.h/cpp: subclasses of autograd::Node
+#  python_functions.h/cpp: Python bindings for the above classes
+#
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+from torchgen.api.autograd import (
+    Derivative,
+    DifferentiabilityInfo,
+    SavedAttribute,
+    uses_retain_variables,
+    uses_single_grad,
+)
+from torchgen.api.types import (
+    ArrayRefCType,
+    BaseCppType,
+    BaseCType,
+    Binding,
+    boolT,
+    doubleT,
+    intArrayRefT,
+    iTensorListRefT,
+    ListCType,
+    longT,
+    MutRefCType,
+    OptionalCType,
+    optionalIntArrayRefT,
+    optionalSymIntArrayRefT,
+    scalarT,
+    stringT,
+    symIntArrayRefT,
+    SymIntT,
+    TENSOR_LIST_LIKE_CTYPES,
+    tensorListT,
+    tensorT,
+    VectorCType,
+)
+from torchgen.code_template import CodeTemplate
+from torchgen.model import Argument, FunctionSchema
+from torchgen.utils import FileManager
+
+from .gen_inplace_or_view_type import VIEW_FUNCTIONS
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+
+FUNCTION_DECLARATION = CodeTemplate(
+    """\
+#ifdef _WIN32
+struct ${op} : public ${superclass} {
+  TORCH_API ${op}() = default;
+#else
+struct TORCH_API ${op} : public ${superclass} {
+#endif
+  using ${superclass}::${superclass};
+  variable_list apply(variable_list&& grads) override;
+  std::string name() const override { return "${op}"; }
+  void release_variables() override {
+    ${thread_lock}
+    ${release_variables}
+  }
+  ${will_release_variables}
+  void compiled_args(CompiledNodeArgs& args) const override;
+  variable_list apply_with_saved(const variable_list& inputs, SwapSavedVariables& saved) override;
+  ${saved_variables}
+  ${saved_list_sizes}
+};
+"""
+)
+
+WILL_RELEASE_VARIABLES = CodeTemplate(
+    """\
+bool retain_variables = true;
+void will_release_variables() override {
+  retain_variables = false;
+}
+"""
+)
+
+# We generate e.g. MulBackward0::apply and have that call into
+# MulBackward0_apply_functional. The apply_functional is a pure function,
+# that is, it does not rely on global state. MulBackward0::apply
+# is responsible for querying the autograd engine for which outputs should
+# be computed (needs_input_grad), applying locks,
+# and unpacking saved variables to pass to MulBackward0_apply_functional.
+#
+# needs_input_grad is a mapping from input index to if that input needs
+# gradients computed. For operators that take in List[Tensor], the List[Tensor]
+# is one element in the needs_input_grad that specifies if *any* of the
+# List[Tensor] needs input grad. In theory this could be optimized.
+FUNCTION_DEFINITION = CodeTemplate(
+    """\
+static variable_list ${op}_apply_functional(
+  variable_list&& grads,
+  std::array<bool,${num_inputs}> needs_input_grad${,apply_functional_args_signature})
+{
+  IndexRangeGenerator gen;
+  ${compute_index_ranges}
+  variable_list grad_inputs(gen.size());
+  ${body}
+  return grad_inputs;
+}
+inline variable_list ${op}_apply_functional_ivalue(const variable_list& grads, const ivalue_list& args)
+{
+#ifdef C10_MOBILE
+  TORCH_INTERNAL_ASSERT(false, "compiled autograd doesn't work on mobile");
+#else
+  auto packed_args = PackedArgs(args);
+  auto needs_input_grad = packed_args.unpack<std::array<bool, ${num_inputs}>>();
+  ${unpack_ivalues}
+  return ${op}_apply_functional(variable_list(grads), needs_input_grad${,apply_functional_args});
+#endif
+}
+
+variable_list ${op}::apply(variable_list&& grads) {
+  ${thread_lock}
+  ${asserts}
+  ${unpacks}
+  ${compute_needs_input_grad}
+  return ${op}_apply_functional(std::move(grads), needs_input_grad${,apply_functional_args});
+}
+
+void ${op}::compiled_args(CompiledNodeArgs& args) const {
+    ${compiled_args}
+}
+variable_list ${op}::apply_with_saved(const variable_list& grads, SwapSavedVariables& saved) {
+#ifdef C10_MOBILE
+  TORCH_INTERNAL_ASSERT(false, "compiled autograd doesn't work on mobile");
+#else
+  ${apply_with_saved_before}
+
+  static bool called = false;
+  if (!called) {
+    called = true;
+    ${compute_schema}
+    const auto& pyinterface = torch::dynamo::autograd::getPyCompilerInterface();
+    pyinterface->bind_function(saved.get_py_compiler(), name(), ${op}_apply_functional_ivalue, schema);
+  }
+
+  variable_list output_result;
+
+  PackedArgs packed_args;
+  ${asserts}
+  ${unpacks}
+  ${compute_needs_input_grad}
+  packed_args.pack(needs_input_grad);
+  ${get_packed_args}
+
+  output_result = compiled_autograd_apply_functional(packed_args, next_edges(), saved, grads, name());
+
+  ${apply_with_saved_after}
+  return output_result;
+#endif
+}
+
+"""
+)
+
+GRAD_INPUT_MASK = CodeTemplate(
+    """\
+  auto grad_input_mask = std::array<bool, ${n}>{
+    ${masks}
+  };
+"""
+)
+
+COMPUTE_NEEDS_INPUT_GRAD = CodeTemplate(
+    """\
+IndexRangeGenerator gen;
+${compute_index_ranges}
+auto needs_input_grad = std::array<bool, ${n}>{
+  ${masks}
+};\
+"""
+)
+
+
+DERIVATIVE_SINGLE = CodeTemplate(
+    """\
+if (needs_input_grad[/*${name}*/${idx}]) {
+  auto grad_result = ${derivative};
+  copy_range(grad_inputs, ${name}_ix, grad_result);
+}
+"""
+)
+
+# note(crcrpar): `self` argument and other optional positional argument
+# of foreach functions are basically a list of n `Tensor`s thus iterating over
+# `grads` in order to utilize and apply the existing derivative definitions
+# to each `Tensor`(s) of `self`, and the others.
+DERIVATIVE_SINGLE_FOREACH = CodeTemplate(
+    """\
+if (needs_input_grad[/*${name}*/${idx}]) {  // ${name}
+  std::vector<Tensor> grad_result;
+  grad_result.reserve(grads.size());
+  for (const auto & i : c10::irange(grads.size())) {
+    if (grads[i].defined()) {
+      grad_result.emplace_back(${derivative});
+    } else {
+      grad_result.emplace_back(Tensor());
+    }
+  }
+  copy_range(grad_inputs, ${name}_ix, grad_result);
+}
+"""
+)
+
+DERIVATIVE_MULTI_COPY_RANGE = CodeTemplate(
+    """\
+  if (needs_input_grad[/*${name}*/${idx}]) {
+    copy_range(grad_inputs, ${name}_ix, std::get<${i}>(grad_result));
+  }
+"""
+)
+
+DERIVATIVE_MULTI = CodeTemplate(
+    """\
+if (${needs_input_grad}) {
+  ${grad_input_mask}
+  auto grad_result = ${derivative};
+  ${copy_ranges}
+}
+"""
+)
+
+# Generates python bindings
+#
+# This generates the definitions for:
+#   (1) The PyTypeObject for each backward grad_fn subclassing Node
+#   (2) The entry for PyTypeObject's tp_getset slot (an array of PyGetSetDef structs)
+#       We generate one PyGetSetDef struct for each of grad_fn's saved inputs and outputs
+#       Each PyGetSetDef has a function ptr to a getter, also defined here (3).
+#   (3) Getters for each of grad_fn's saved inputs and outputs.
+#
+PY_FUNCTION_DEFINITION = CodeTemplate(
+    """\
+static PyTypeObject ${op}Class;
+addClass<${op}>(module, ${op}Class, "${op}", ${op}_properties);
+"""
+)
+
+PY_FUNCTION_PROPS_AND_GETTERS = CodeTemplate(
+    """\
+${all_getter_definitions}
+
+static struct PyGetSetDef ${op}_properties[] = {
+  THP_FUNCTION_DEFAULT_PROPERTIES,
+  ${all_getsetdef_structs}
+  {nullptr} /* sentinel */
+};
+
+"""
+)
+
+PY_GETSETDEF_STRUCT = CodeTemplate(
+    """\
+{(char*)"_saved_${name}", (getter)THP${op}_${name}_getter, nullptr, nullptr, nullptr}"""
+)
+
+PY_RAW_GETSETDEF_STRUCT = CodeTemplate(
+    """\
+{(char*)"_raw_saved_${name}", (getter)THP${op}_${name}_raw_getter, nullptr, nullptr, nullptr}"""
+)
+
+# Getter templates
+GETTER_DEFINITION = CodeTemplate(
+    """\
+static PyObject* THP${op}_${name}_getter(THPCppFunction *self, void *_unused) {
+  HANDLE_TH_ERRORS
+  auto prop = static_cast<${op}*>(self->cdata.get())->${name};
+  ${body}
+  END_HANDLE_TH_ERRORS
+}
+"""
+)
+
+GETTER_DEFINITION_SAVEDVAR = CodeTemplate(
+    """\
+static PyObject* THP${op}_${name}_getter(THPCppFunction *self, void *_unused) {
+  HANDLE_TH_ERRORS
+  const auto& prop = static_cast<${op}*>(self->cdata.get())->${name}_;
+  ${body}
+  END_HANDLE_TH_ERRORS
+}
+"""
+)
+
+GETTER_DEFINITION_RAW_SAVEDVAR = CodeTemplate(
+    """\
+static PyObject* THP${op}_${name}_raw_getter(THPCppFunction *self, void *_unused) {
+  HANDLE_TH_ERRORS
+  const auto& prop = static_cast<${op}*>(self->cdata.get())->${name}_;
+  ${body}
+  END_HANDLE_TH_ERRORS
+}
+"""
+)
+
+GETTER_DEFINITION_VEC_SAVEDVAR = CodeTemplate(
+    """\
+static PyObject* THP${op}_${name}_getter(THPCppFunction *self, void *_unused) {
+  HANDLE_TH_ERRORS
+  const auto *node = static_cast<${op}*>(self->cdata.get());
+  const auto& prop = node->${name}_;
+  if (node->${name}_released_) {
+    PyErr_SetString(PyExc_RuntimeError, ERR_BACKWARD_TWICE);
+    return nullptr;
+  }
+  ${body}
+  END_HANDLE_TH_ERRORS
+}
+"""
+)
+
+GETTER_DEFINITION_RAW_VEC_SAVEDVAR = CodeTemplate(
+    """\
+static PyObject* THP${op}_${name}_raw_getter(THPCppFunction *self, void *_unused) {
+  HANDLE_TH_ERRORS
+  const auto *node = static_cast<${op}*>(self->cdata.get());
+  const auto& prop = node->${name}_;
+  if (node->${name}_released_) {
+    PyErr_SetString(PyExc_RuntimeError, ERR_BACKWARD_TWICE);
+    return nullptr;
+  }
+  ${body}
+  END_HANDLE_TH_ERRORS
+}
+"""
+)
+
+GETTER_DEFINITION_OPT = CodeTemplate(
+    """\
+static PyObject* THP${op}_${name}_getter(THPCppFunction *self, void *_unused) {
+  HANDLE_TH_ERRORS
+  auto opt_prop = static_cast<${op}*>(self->cdata.get())->${name};
+  if (!opt_prop.has_value()) {
+    Py_RETURN_NONE;
+  }
+  auto prop = opt_prop.value();
+  ${body}
+  END_HANDLE_TH_ERRORS
+}
+"""
+)
+
+GETTER_DEFINITION_OPT_ARRAYREF = CodeTemplate(
+    """\
+static PyObject* THP${op}_${name}_getter(THPCppFunction *self, void *_unused) {
+  HANDLE_TH_ERRORS
+  auto opt_prop = static_cast<${op}*>(self->cdata.get())->${name};
+  if (!opt_prop.list.has_value()) {
+    Py_RETURN_NONE;
+  }
+  auto prop = opt_prop.list.value();
+  ${body}
+  END_HANDLE_TH_ERRORS
+}
+"""
+)
+
+# Getter body
+GETTER_BODY_SAVEDVAR = """\
+return THPVariable_Wrap(prop.unpack(self->cdata));
+"""
+
+GETTER_BODY_RAW_SAVEDVAR = """\
+pybind11::object obj = pybind11::cast(prop, pybind11::return_value_policy::reference);
+return obj.release().ptr();
+"""
+
+GETTER_BODY_VEC_SAVEDVAR = """\
+PyObject* tup = PyTuple_New((Py_ssize_t) prop.size());
+for (auto i: c10::irange(prop.size())) {
+  PyTuple_SetItem(tup, (Py_ssize_t) i, THPVariable_Wrap(prop[i].unpack(self->cdata)));
+}
+return tup;
+"""
+
+GETTER_BODY_RAW_VEC_SAVEDVAR = """\
+PyObject* tup = PyTuple_New((Py_ssize_t) prop.size());
+for (auto i : c10::irange(prop.size())) {
+  pybind11::object obj = pybind11::cast(prop[i], pybind11::return_value_policy::reference);
+  PyTuple_SetItem(tup, (Py_ssize_t) i, obj.release().ptr());
+}
+return tup;
+"""
+
+GETTER_BODY_ARRAYREF_LONG = """\
+PyObject* tup = PyTuple_New((Py_ssize_t) prop.size());
+for (auto i : c10::irange(prop.size())) {
+  PyTuple_SetItem(tup, (Py_ssize_t) i, PyLong_FromUnsignedLong((uint64_t) prop[i]));
+}
+return tup;
+"""
+
+GETTER_BODY_ARRAYREF_SYMINT = """\
+PyObject* tup = PyTuple_New((Py_ssize_t) prop.size());
+for (auto i : c10::irange(prop.size())) {
+    auto si = prop[i];
+    if (auto m = si.maybe_as_int()) {
+      PyTuple_SetItem(tup, (Py_ssize_t) i, PyLong_FromUnsignedLong(*m));
+    } else {
+      auto py_symint = py::cast(si).release().ptr();
+      PyTuple_SetItem(tup, (Py_ssize_t) i, py_symint);
+    }
+}
+return tup;
+"""
+
+GETTER_BODY_ARRAYREF_DOUBLE = """\
+PyObject* tup = PyTuple_New((Py_ssize_t) prop.size());
+for (auto i : c10::irange(prop.size())) {
+  PyTuple_SetItem(tup, (Py_ssize_t) i, PyFloat_FromDouble((double) prop[i]));
+}
+return tup;
+"""
+
+GETTER_BODY_INT64_T = """\
+return PyLong_FromUnsignedLong((int64_t) prop);
+"""
+
+GETTER_BODY_SYMINT = """\
+if (auto m = prop.maybe_as_int()) {
+  return PyLong_FromUnsignedLong(*m);
+} else {
+  return py::cast(prop).release().ptr();
+}
+"""
+
+GETTER_BODY_DOUBLE = """\
+return PyFloat_FromDouble((double) prop);
+"""
+
+GETTER_BODY_BOOL = """\
+if (prop) {
+  Py_RETURN_TRUE;
+} else {
+  Py_RETURN_FALSE;
+}
+"""
+
+GETTER_BODY_STRING = """\
+return PyUnicode_FromStringAndSize(prop.data(), prop.size());
+"""
+
+GETTER_BODY_SCALAR = """\
+if (prop.isComplex()) {
+  auto cprop = prop.to<c10::complex<double>>();
+  return PyComplex_FromDoubles(cprop.real(), cprop.imag());
+} else if (prop.isFloatingPoint()) {
+  return PyFloat_FromDouble(prop.to<double>());
+} else if (prop.isIntegral(/*includeBool=*/false)) {
+  return PyLong_FromLong(prop.to<int64_t>());
+} else if (prop.isBoolean()) {
+  if (prop.to<bool>()) {
+    Py_RETURN_TRUE;
+  } else {
+    Py_RETURN_FALSE;
+  }
+} else {
+  PyErr_SetString(PyExc_RuntimeError, "Unknown scalar type");
+  return nullptr;
+}
+"""
+
+
+GETTER_BODY_VEC_SCALAR = """\
+PyObject* tup = PyTuple_New((Py_ssize_t) prop.size());
+for (auto i: c10::irange(prop.size())) {
+  if (prop[i].isComplex()) {
+    auto cprop = prop[i].to<c10::complex<double>>();
+    PyTuple_SetItem(tup, (Py_ssize_t) i, PyComplex_FromDoubles(cprop.real(), cprop.imag()));
+  } else if (prop[i].isFloatingPoint()) {
+    auto double_prop = prop[i].to<double>();
+    PyTuple_SetItem(tup, (Py_ssize_t) i, PyFloat_FromDouble(double_prop));
+  } else if (prop[i].isIntegral(/*includeBool=*/false)) {
+    auto long_prop = prop[i].to<int64_t>();
+    PyTuple_SetItem(tup, (Py_ssize_t) i, PyLong_FromLong(long_prop));
+  } else if (prop[i].isBoolean()) {
+    if (prop[i].to<bool>()) {
+      PyTuple_SetItem(tup, (Py_ssize_t) i, Py_True);
+    } else {
+      PyTuple_SetItem(tup, (Py_ssize_t) i, Py_False);
+    }
+  } else {
+    PyErr_SetString(PyExc_RuntimeError, "Unknown scalar type");
+    return nullptr;
+  }
+}
+return tup;
+"""
+
+
+MISC_GETTER_DEFS = {
+    OptionalCType(BaseCType(longT)): (GETTER_DEFINITION_OPT, GETTER_BODY_INT64_T),
+    OptionalCType(BaseCType(SymIntT)): (GETTER_DEFINITION_OPT, GETTER_BODY_SYMINT),
+    BaseCType(doubleT): (GETTER_DEFINITION, GETTER_BODY_DOUBLE),
+    OptionalCType(BaseCType(doubleT)): (GETTER_DEFINITION_OPT, GETTER_BODY_DOUBLE),
+    BaseCType(boolT): (GETTER_DEFINITION, GETTER_BODY_BOOL),
+    BaseCType(scalarT): (GETTER_DEFINITION, GETTER_BODY_SCALAR),
+    OptionalCType(BaseCType(scalarT)): (GETTER_DEFINITION_OPT, GETTER_BODY_SCALAR),
+}
+
+# These functions have backwards which cannot be traced, and so must have
+# their backward functions traced opaquely.
+# VIEW_FUNCTIONS are not traceable because they use as_strided, which
+# has an untraceable backwards, see
+# https://github.com/pytorch/pytorch/issues/4250
+# TODO: This is probably not exhaustive, but it's a start
+UNTRACEABLE_FUNCTIONS = VIEW_FUNCTIONS
+
+
+def get_infos_with_derivatives_list(
+    differentiability_infos: dict[FunctionSchema, dict[str, DifferentiabilityInfo]],
+) -> list[DifferentiabilityInfo]:
+    diff_info_list = [
+        info
+        for diffinfo_dict in differentiability_infos.values()
+        for info in diffinfo_dict.values()
+    ]
+
+    return list(filter(lambda info: info.args_with_derivatives, diff_info_list))
+
+
+def gen_autograd_functions_lib(
+    out: str,
+    differentiability_infos: dict[FunctionSchema, dict[str, DifferentiabilityInfo]],
+    template_path: str,
+) -> None:
+    """Functions.h and Functions.cpp body
+
+    These contain the auto-generated subclasses of torch::autograd::Node
+    for each every differentiable torch function.
+    """
+
+    # get a 1D list of diffinfos, we do not need them to be per FunctionSchema/DispatchKey here
+    # infos with the diff dispatchkeys but the same name will still be in the same shard.
+    infos = get_infos_with_derivatives_list(differentiability_infos)
+    declarations = [process_function(f, FUNCTION_DECLARATION) for f in infos]
+    definitions = [process_function(f, FUNCTION_DEFINITION) for f in infos]
+
+    file_basename = "Functions"
+    fm = FileManager(install_dir=out, template_dir=template_path, dry_run=False)
+    for suffix in [".h", ".cpp"]:
+        fname = file_basename + suffix
+        fm.write_with_template(
+            fname,
+            fname,
+            lambda: {
+                "generated_comment": "@"
+                + f"generated from {fm.template_dir_for_comments()}/{fname}",
+                "autograd_function_declarations": declarations,
+                "autograd_function_definitions": definitions,
+            },
+        )
+
+
+def gen_autograd_functions_python(
+    out: str,
+    differentiability_infos: dict[FunctionSchema, dict[str, DifferentiabilityInfo]],
+    template_path: str,
+) -> None:
+    fm = FileManager(install_dir=out, template_dir=template_path, dry_run=False)
+    num_shards = 5
+    fm.write(
+        "python_functions.h",
+        lambda: {
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/python_functions.h",
+            "shard_forward_declare": [
+                f"void initialize_autogenerated_functions_{i}(PyObject* module);"
+                for i in range(num_shards)
+            ],
+            "shard_call": [
+                f"initialize_autogenerated_functions_{i}(module);"
+                for i in range(num_shards)
+            ],
+        },
+    )
+
+    # get a 1D list of diffinfos, we do not need them to be per FunctionSchema/DispatchKey here
+    # infos with the diff dispatchkeys but the same name will still be in the same shard.
+    infos = get_infos_with_derivatives_list(differentiability_infos)
+    fm.write_sharded(
+        "python_functions.cpp",
+        infos,
+        key_fn=lambda info: info.name,
+        base_env={
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/python_functions.cpp",
+        },
+        env_callable=lambda info: {
+            "py_function_initializers": [
+                process_function(info, PY_FUNCTION_DEFINITION)
+            ],
+            "py_function_props_and_getters": [
+                process_function(info, PY_FUNCTION_PROPS_AND_GETTERS)
+            ],
+        },
+        num_shards=num_shards,
+        sharded_keys={"py_function_initializers", "py_function_props_and_getters"},
+    )
+
+
+def process_function(info: DifferentiabilityInfo, template: CodeTemplate) -> str:
+    saved_variables: list[str] = []
+    release_variables: list[str] = []
+    saved_list_sizes: list[str] = []
+    unpack: list[str] = []
+    asserts: list[str] = []
+    compute_index_ranges: list[str] = []
+    getter_definitions: list[str] = []
+    py_getsetdef_structs: list[str] = []
+    compiled_args: list[str] = []
+    apply_with_saved_before: list[str] = []
+    apply_with_saved_after: list[str] = []
+    apply_functional_args: list[str] = []
+    apply_functional_args_ref_types: list[str] = []
+    # Maps the name of an input (to the original forward operator;
+    # examples are "self", "other") to the order in which they appear in the
+    # operator.
+    # For example; if the operator is foo(Tensor self, int64_t k, Tensor other),
+    # the mapping is: {"self": 0, "other": 1}.
+    # We use this mapping to populate needs_input_grad in some order and then grab
+    # values from it.
+    input_name_to_idx: dict[str, int] = {}
+
+    for idx, arg in enumerate(info.args_with_derivatives):
+        if arg.type in TENSOR_LIST_LIKE_CTYPES:
+            size = f"{arg.name}_size_"
+            saved_list_sizes.append(f"size_t {arg.name}_size_;")
+            apply_functional_args.append(f"{arg.name}_size_")
+            apply_functional_args_ref_types.append("size_t")
+        else:
+            size = "1"
+        compute_index_ranges.append(f"auto {arg.name}_ix = gen.range({size});")
+        input_name_to_idx[arg.name] = idx
+
+    def save_var(var: SavedAttribute, is_output: bool) -> None:
+        name = var.nctype.name
+        type = var.nctype.type
+        should_append_getsetdef = True
+        should_append_raw_getsetdef = False
+        visit_name = name
+        uses_cpp_saved_variable_cls = False
+        unpacked_ref_type = None
+
+        if (
+            type == BaseCType(tensorT)
+            or type == OptionalCType(BaseCType(tensorT))
+            or type == MutRefCType(OptionalCType(BaseCType(tensorT)))
+            or (type == BaseCType(scalarT) and is_output)
+        ):
+            uses_cpp_saved_variable_cls = True
+            saved_variables.append(f"SavedVariable {name}_;")
+            release_variables.append(f"{name}_.reset_data();")
+            ptr = "shared_from_this()" if is_output else ""
+            unpack.append(f"auto {name} = {name}_.unpack({ptr});")
+            getter_definitions.append(
+                GETTER_DEFINITION_SAVEDVAR.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_SAVEDVAR
+                )
+            )
+            getter_definitions.append(
+                GETTER_DEFINITION_RAW_SAVEDVAR.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_RAW_SAVEDVAR
+                )
+            )
+            should_append_raw_getsetdef = True
+            visit_name = f"{name}_"
+            unpacked_ref_type = "Tensor&"
+        elif (
+            type == BaseCType(tensorListT)
+            or type == BaseCType(iTensorListRefT)
+            or type == VectorCType(BaseCType(tensorT))
+        ):
+            # note(crcrpar): [nuanced return type of out-of-place foreach functions]
+            # When an out-of-place foreach function whose return signature is `Tensor[]`
+            # spells out its backward definitions in `derivatives.yaml`, and some of them depend on
+            # `result`, `result`'s type is interpreted and treated as `std::vector<Tensor>`.
+            # An out-of-place foreach whose backwards rely on their output doesn't suffer from this
+            # difference if the definitions are codegen'ed.
+            # This special case is needed for `_foreach_pow.List` and `_foreach_pow.ScalarAndTensor`
+            # as of https://github.com/pytorch/pytorch/pull/105504.
+            if type == VectorCType(BaseCType(tensorT)):
+                assert (
+                    info.func.func.name.name.base.startswith("_foreach") and is_output
+                )
+            uses_cpp_saved_variable_cls = True
+            saved_variables.append(f"std::vector<SavedVariable> {name}_;")
+            saved_variables.append(f"bool {name}_released_ = false;")
+            # Just clear() is sufficient, we don't need to loop and clear each variable.
+            # Because the SavedVariable owns a tensor and a grad_fn, removing the SavedVariable makes them go away as well.
+            release_variables.append(f"{name}_.clear();")
+            release_variables.append(f"{name}_released_ = true;")
+            ptr = "shared_from_this()" if is_output else "nullptr"
+            unpack.append(f"auto {name} = unpack_list({name}_, {ptr});")
+            asserts.append(f"TORCH_CHECK(!{name}_released_, ERR_BACKWARD_TWICE);")
+            getter_definitions.append(
+                GETTER_DEFINITION_VEC_SAVEDVAR.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_VEC_SAVEDVAR
+                )
+            )
+            getter_definitions.append(
+                GETTER_DEFINITION_RAW_VEC_SAVEDVAR.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_RAW_VEC_SAVEDVAR
+                )
+            )
+            should_append_raw_getsetdef = True
+            visit_name = f"{name}_"
+            unpacked_ref_type = "std::vector<Tensor>&"
+        elif type == ListCType(OptionalCType(BaseCType(tensorT))):
+            uses_cpp_saved_variable_cls = True
+            saved_variables.append(f"std::vector<SavedVariable> {name}_;")
+            saved_variables.append(f"bool {name}_released_ = false;")
+            # Just clear() is sufficient, we don't need to loop and clear each variable.
+            # Because the SavedVariable owns a tensor and a grad_fn, removing the SavedVariable makes them go away as well.
+            release_variables.append(f"{name}_.clear();")
+            release_variables.append(f"{name}_released_ = true;")
+            unpack.append(f"auto {name} = unpack_opt_list({name}_);")
+            asserts.append(f"TORCH_CHECK(!{name}_released_, ERR_BACKWARD_TWICE);")
+            getter_definitions.append(
+                GETTER_DEFINITION_VEC_SAVEDVAR.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_VEC_SAVEDVAR
+                )
+            )
+            getter_definitions.append(
+                GETTER_DEFINITION_RAW_VEC_SAVEDVAR.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_RAW_VEC_SAVEDVAR
+                )
+            )
+            should_append_raw_getsetdef = True
+            visit_name = f"{name}_"
+            unpacked_ref_type = "torch::List<std::optional<Tensor>>&"
+        elif type == BaseCType(intArrayRefT):
+            saved_variables.append(f"std::vector<int64_t> {name};")
+            getter_definitions.append(
+                GETTER_DEFINITION.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_ARRAYREF_LONG
+                )
+            )
+        elif type == BaseCType(symIntArrayRefT):
+            saved_variables.append(f"std::vector<c10::SymInt> {name};")
+            getter_definitions.append(
+                GETTER_DEFINITION.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_ARRAYREF_SYMINT
+                )
+            )
+        elif type == BaseCType(optionalIntArrayRefT):
+            saved_variables.append(f"c10::OptionalArray<int64_t> {name};")
+            getter_definitions.append(
+                GETTER_DEFINITION_OPT_ARRAYREF.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_ARRAYREF_LONG
+                )
+            )
+        elif type == BaseCType(optionalSymIntArrayRefT):
+            saved_variables.append(f"c10::OptionalArray<c10::SymInt> {name};")
+            getter_definitions.append(
+                GETTER_DEFINITION_OPT_ARRAYREF.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_ARRAYREF_SYMINT
+                )
+            )
+        elif type == OptionalCType(BaseCType(intArrayRefT)):
+            saved_variables.append(f"c10::OptionalArray<int64_t> {name};")
+            getter_definitions.append(
+                GETTER_DEFINITION_OPT_ARRAYREF.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_ARRAYREF_LONG
+                )
+            )
+        elif type == OptionalCType(BaseCType(symIntArrayRefT)):
+            saved_variables.append(f"c10::OptionalArray<c10::SymInt> {name};")
+            getter_definitions.append(
+                GETTER_DEFINITION_OPT_ARRAYREF.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_ARRAYREF_SYMINT
+                )
+            )
+        elif type == OptionalCType(ArrayRefCType(BaseCType(doubleT))):
+            saved_variables.append(f"c10::OptionalArray<double> {name};")
+            getter_definitions.append(
+                GETTER_DEFINITION_OPT_ARRAYREF.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_ARRAYREF_DOUBLE
+                )
+            )
+        elif type == BaseCType(longT):
+            saved_variables.append(f"{type.cpp_type()} {name} = 0;")
+            getter_definitions.append(
+                GETTER_DEFINITION.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_INT64_T
+                )
+            )
+        elif type == BaseCType(SymIntT):
+            saved_variables.append(f"c10::SymInt {name};")
+            getter_definitions.append(
+                GETTER_DEFINITION.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_SYMINT
+                )
+            )
+        elif type == BaseCType(stringT):
+            saved_variables.append(f"std::string {name};")
+            getter_definitions.append(
+                GETTER_DEFINITION.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_STRING
+                )
+            )
+        elif type == OptionalCType(BaseCType(stringT)):
+            saved_variables.append(f"std::optional<std::string> {name};")
+            getter_definitions.append(
+                GETTER_DEFINITION_OPT.substitute(
+                    op=info.op, name=name, body=GETTER_BODY_STRING
+                )
+            )
+        elif type == ArrayRefCType(
+            elem=BaseCType(type=BaseCppType(ns="at", name="Scalar"))
+        ):
+            saved_variables.append(f"std::vector<at::Scalar> {name};")
+            unpacked_ref_type = "std::vector<at::Scalar>&"
+            saved_variables.append(f"bool {name}_released_ = false;")
+            # Just clear() is sufficient, we don't need to loop and clear each variable.
+            # Because the SavedVariable owns a tensor and a grad_fn, removing the SavedVariable makes them go away as well.
+            release_variables.append(f"{name}.clear();")
+            # release_variables.append(f"{name}_released_ = true;")
+            # unpack.append(f"auto {name} = unpack_list({name}_);")
+            # asserts.append(f"TORCH_CHECK(!{name}_released_, ERR_BACKWARD_TWICE);")
+            getter_definitions.append(
+                CodeTemplate(
+                    """\
+static PyObject* THP${op}_${name}_getter(THPCppFunction *self, void *_unused) {
+  HANDLE_TH_ERRORS
+  const auto *node = static_cast<${op}*>(self->cdata.get());
+  const auto& prop = node->${name};
+  if (node->${name}_released_) {
+    PyErr_SetString(PyExc_RuntimeError, ERR_BACKWARD_TWICE);
+    return nullptr;
+  }
+  ${body}
+  END_HANDLE_TH_ERRORS
+}
+                            """
+                ).substitute(
+                    op=info.op,
+                    name=name,
+                    body=GETTER_BODY_VEC_SCALAR,
+                )
+            )
+        else:
+            # Check for indicators that you're putting a non-owning reference
+            # into the saved variable field.  If this is spuriously firing,
+            # edit this field.  Otherwise, you probably need to add a case
+            # above.
+            assert (
+                "ref" not in type.cpp_type().lower()
+                and "view" not in type.cpp_type().lower()
+                and "*" not in type.cpp_type()
+                and "&" not in type.cpp_type()
+            ), f"{type.cpp_type()} looks like it contains a non-owning reference"
+            saved_variables.append(f"{type.cpp_type()} {name};")
+
+            if type in MISC_GETTER_DEFS:
+                # pyrefly: ignore [index-error]
+                getter_def, body = MISC_GETTER_DEFS[type]
+                getter_definitions.append(
+                    getter_def.substitute(op=info.op, name=name, body=body)
+                )
+            else:
+                # Types we don't expose python bindings to yet:
+                #   TypeAndSize, at::ScalarType, TensorOptions, TensorGeometry,
+                #   std::vector<std::vector<int64_t>>, std::vector<at::ScalarType>
+                should_append_getsetdef = False
+
+        if should_append_getsetdef:
+            py_getsetdef_structs.append(
+                PY_GETSETDEF_STRUCT.substitute(op=info.op, name=name)
+            )
+        if should_append_raw_getsetdef:
+            py_getsetdef_structs.append(
+                PY_RAW_GETSETDEF_STRUCT.substitute(op=info.op, name=name)
+            )
+
+        if uses_cpp_saved_variable_cls:
+            compiled_args.append(
+                f"args.collect({visit_name}, {'true' if is_output else 'false'});"
+            )
+        else:
+            compiled_args.append(f"args.collect({visit_name});")
+        apply_with_saved_before.append(f"saved.before({visit_name});")
+        apply_with_saved_after.append(f"saved.after({visit_name});")
+
+        if unpacked_ref_type is None:
+            unpacked_ref_type = f"{saved_variables[-1].split(' ')[0]}&"
+        apply_functional_args.append(str(name))
+        apply_functional_args_ref_types.append(unpacked_ref_type)
+
+    for var in sorted(info.all_saved_inputs, key=lambda sa: str(sa.nctype.name)):
+        save_var(var, is_output=False)
+    for var in sorted(info.all_saved_outputs, key=lambda sa: str(sa.nctype.name)):
+        save_var(var, is_output=True)
+
+    # lock the mutex when we release variables and in Node::apply to protect thread safety
+    # see Note [Thread Safety on Autograd Node]
+    if len(release_variables) > 0:
+        thread_lock = "std::lock_guard<std::mutex> lock(mutex_);"
+    else:
+        thread_lock = ""
+
+    if uses_retain_variables(info):
+        apply_functional_args.append("retain_variables")
+        apply_functional_args_ref_types.append("bool")
+        will_release_variables = WILL_RELEASE_VARIABLES.substitute()
+    else:
+        will_release_variables = ""
+
+    body: list[str] = []
+
+    if uses_single_grad(info):
+        body.append("const auto& grad = grads[0];")
+    else:
+        # Generate aliases for gradients named for returned values.
+        body.extend(
+            f"const auto& {name} = grads[{info.available_named_gradients.index(name)}];"
+            for name in sorted(info.used_named_gradients)
+        )
+
+    def emit_derivative(
+        derivative: Derivative,
+        args_with_derivatives: Sequence[Binding],
+    ) -> tuple[bool, str]:
+        formula = derivative.formula
+        var_names = derivative.var_names
+
+        if len(var_names) == 1:
+            checks_any_grad_defined = False
+            if "not_implemented" not in formula:
+                matching_args = [
+                    arg for arg in args_with_derivatives if arg.name == var_names[0]
+                ]
+                if len(matching_args) == 1:
+                    # We can add undefined grad support if the input variable is a Tensor
+                    arg = matching_args[0]
+                    if isinstance(arg.argument, Argument) and str(
+                        arg.argument.type
+                    ) in ("Tensor", "Tensor?"):
+                        formula = "any_grad_defined ? (" + formula + ") : Tensor()"
+                        checks_any_grad_defined = True
+            if info.name.startswith("_foreach_"):
+                derivative_template = DERIVATIVE_SINGLE_FOREACH
+            else:
+                derivative_template = DERIVATIVE_SINGLE
+            return (
+                checks_any_grad_defined,
+                derivative_template.substitute(
+                    name=var_names[0],
+                    derivative=formula,
+                    idx=input_name_to_idx[var_names[0]],
+                ),
+            )
+
+        else:
+            if "grad_input_mask" in formula:
+                masks = [
+                    f"needs_input_grad[{input_name_to_idx[name]}],"
+                    for name in var_names
+                ]
+                grad_input_mask = GRAD_INPUT_MASK.substitute(
+                    n=len(var_names), masks=masks
+                )
+            else:
+                grad_input_mask = ""
+            needs_input_grad = [
+                f"needs_input_grad[{input_name_to_idx[name]}]" for name in var_names
+            ]
+            needs_input_grad = " || ".join(needs_input_grad)
+            copy_ranges: list[str] = []
+            for i, n in enumerate(var_names):
+                copy_ranges.append(
+                    DERIVATIVE_MULTI_COPY_RANGE.substitute(
+                        name=n, i=i, idx=input_name_to_idx[n]
+                    )
+                )
+            return False, DERIVATIVE_MULTI.substitute(
+                needs_input_grad=needs_input_grad,
+                copy_ranges=copy_ranges,
+                derivative=formula,
+                grad_input_mask=grad_input_mask,
+            )
+
+    masks = []
+
+    need_any_grad_defined_var = False
+    for derivative in info.derivatives:
+        checks_any_grad_defined, derivative_text = emit_derivative(
+            derivative, info.args_with_derivatives
+        )
+        body.append(derivative_text)
+        need_any_grad_defined_var |= checks_any_grad_defined
+
+    for name in input_name_to_idx:
+        masks.append(f"task_should_compute_output({{ {name}_ix }}),")
+
+    # Since single-output derivative formulas need to check if grads are
+    # defined, only perform the check once, before all the formulas
+    if need_any_grad_defined_var:
+        body.insert(
+            -len(info.derivatives),
+            "bool any_grad_defined = any_variable_defined(grads);",
+        )
+
+    if info.name in UNTRACEABLE_FUNCTIONS:
+        superclass = "Node"
+    else:
+        superclass = "TraceableFunction"
+
+    all_getsetdef_structs = (
+        ",\n".join(py_getsetdef_structs) + "," if len(py_getsetdef_structs) != 0 else ""
+    )
+    all_getter_definitions = "\n".join(getter_definitions)
+
+    compute_needs_input_grad = COMPUTE_NEEDS_INPUT_GRAD.substitute(
+        n=len(masks), compute_index_ranges=compute_index_ranges, masks=masks
+    )
+    apply_functional_args_signature = [
+        f"{T} {x}"
+        for T, x in zip(apply_functional_args_ref_types, apply_functional_args)
+    ]
+    get_packed_args = "\n".join(
+        f"packed_args.pack({name});" for name in apply_functional_args
+    )
+    unpack_ivalues = []
+    for typ, name in zip(apply_functional_args_ref_types, apply_functional_args):
+        typ = typ.removesuffix("&")
+        # pyrefly: ignore [bad-argument-type]
+        unpack_ivalues.append(f"auto {name} = packed_args.unpack<{typ}>();")
+
+    schema_args = [f"std::array<bool, {len(input_name_to_idx)}>"]
+    for typ in apply_functional_args_ref_types:
+        typ = typ.removesuffix("&")
+        typ = typ.removeprefix("const")
+        schema_args.append(typ.strip())
+    compute_schema = ["std::vector<at::TypePtr> schema = {"]
+    for schema_arg in schema_args:
+        compute_schema.append(
+            f"  torch::dynamo::autograd::IValuePacker<{schema_arg}>::packed_type(),"
+        )
+    compute_schema.append("};")
+
+    return template.substitute(
+        unpacks="\n".join(unpack),
+        op=info.op,
+        compute_schema="\n".join(compute_schema),
+        apply_functional_args=apply_functional_args,
+        apply_functional_args_signature=apply_functional_args_signature,
+        compute_needs_input_grad=compute_needs_input_grad,
+        num_inputs=len(input_name_to_idx),
+        unpack_ivalues="\n".join(unpack_ivalues),
+        compute_index_ranges=compute_index_ranges,
+        saved_variables=saved_variables,
+        release_variables=release_variables,
+        saved_list_sizes=saved_list_sizes,
+        asserts=asserts,
+        thread_lock=thread_lock,
+        will_release_variables=will_release_variables,
+        body=body,
+        superclass=superclass,
+        all_getter_definitions=all_getter_definitions,
+        all_getsetdef_structs=all_getsetdef_structs,
+        compiled_args=compiled_args,
+        apply_with_saved_before=apply_with_saved_before,
+        apply_with_saved_after=apply_with_saved_after,
+        get_packed_args=get_packed_args,
+    )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_inplace_or_view_type.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_inplace_or_view_type.py
new file mode 100644
index 0000000000000000000000000000000000000000..4cb3429c39276ec2ad62ff111e7226512b31596f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_inplace_or_view_type.py
@@ -0,0 +1,673 @@
+# Generates ADInplaceOrViewType.h/cpp
+#
+# NOTE: If any changes are being made to the ADInplaceOrView codegen please also check
+# if updates are needed in torch/csrc/autograd/autograd_not_implemented_fallback.cpp
+# The fallback is expected to mimic this codegen, so we should keep the two in sync.
+
+from __future__ import annotations
+
+from torchgen.api import cpp
+from torchgen.api.autograd import (
+    dispatch_strategy,
+    gen_differentiable_outputs,
+    NativeFunctionWithDifferentiabilityInfo,
+)
+from torchgen.api.types import (
+    BaseCType,
+    Binding,
+    boolT,
+    ConstRefCType,
+    CType,
+    DispatcherSignature,
+    intArrayRefT,
+    longT,
+    OptionalCType,
+    symIntArrayRefT,
+    SymIntT,
+    tensorT,
+)
+from torchgen.code_template import CodeTemplate
+from torchgen.context import with_native_function
+from torchgen.model import (
+    NativeFunction,
+    SchemaKind,
+    SelfArgument,
+    TensorOptionsArguments,
+    Type,
+)
+from torchgen.utils import FileManager
+
+from .context import with_native_function_with_differentiability_info
+from .gen_trace_type import (
+    get_return_value,
+    MANUAL_AUTOGRAD,
+    tie_return_values,
+    type_wrapper_name,
+)
+
+
+# See NOTE [ Autograd View Variables ] in variable.h for details.
+# If you update list VIEW_FUNCTIONS or RETURNS_VIEWS_OF_INPUT,
+# you **MUST** also update the public list of view ops accordingly in
+# docs/source/tensor_view.rst. Note not all ATen functions are exposed to public,
+# e.g alias & sparse_coo_tensor_with_dims_and_tensors.
+#
+# A map: function name => name of the argument that all outputs are view of
+
+VIEW_FUNCTIONS_WITH_METADATA_CHANGE = [
+    "view_as_complex",
+    "view_as_real",
+    "_conj",
+    "_neg_view",
+    "_nested_get_values",
+    "_nested_view_from_buffer",
+    "_nested_view_from_jagged",
+]
+
+VIEW_FUNCTIONS = {
+    "numpy_T": "self",
+    "alias": "self",
+    "as_strided": "self",
+    "diagonal": "self",
+    "expand": "self",
+    "permute": "self",
+    "select": "self",
+    "slice": "self",
+    "slice_inverse": "self",
+    "split": "self",
+    "split_with_sizes": "self",
+    "squeeze": "self",
+    "t": "self",
+    "transpose": "self",
+    "unfold": "self",
+    "unsqueeze": "self",
+    "flatten": "self",
+    "view": "self",
+    "unbind": "self",
+    "_indices": "self",
+    "_values": "self",
+    "indices": "self",
+    "values": "self",
+    "crow_indices": "self",
+    "col_indices": "self",
+    "ccol_indices": "self",
+    "row_indices": "self",
+    # sparse_coo ctor output should really be views of both indices and values,
+    # but we only supports making as view of a single variable, and indices is
+    # discrete anyways.
+    # FIXME: clone indices on construction.
+    "sparse_coo_tensor_with_dims_and_tensors": "values",
+    "_reshape_alias": "self",
+    "_test_autograd_multiple_dispatch_view": "self",
+}
+
+for key in VIEW_FUNCTIONS_WITH_METADATA_CHANGE:
+    VIEW_FUNCTIONS[key] = "self"
+
+# note: some VIEW_FUNCTIONS are just compositions of the view functions above
+# this list contains both the root view functions and any that are purely composed
+# of viewing functions, and is used by the JIT to determine when an operator
+# may return a view of its inputs; however they may sometimes return a copy.
+# (e.g. `contiguous`)
+RETURNS_VIEWS_OF_INPUT = set(VIEW_FUNCTIONS.keys()).union(
+    {
+        "chunk",
+        "detach",
+        "contiguous",
+        "reshape",
+        "reshape_as",
+        "expand_as",
+        "view_as",
+        "real",
+        "imag",
+        "narrow",
+        "movedim",
+        "tensor_split",
+        "swapdims",
+        "swapaxes",
+        "mT",
+        "mH",
+        "adjoint",
+        "matrix_H",
+    }
+)
+
+# These are the functions we consider views for the purposes of validating
+# StorageImpl and TensorImpl in gen_variable_type.
+# `_unsafe_view` is not included in VIEW_FUNCTIONS above because it is not a
+# view for the purposes of ADInplaceOrView kernel, we do not want to call as_view
+# See NOTE [Unsafe View] for more info.
+ALL_VIEW_FUNCTIONS = {
+    **VIEW_FUNCTIONS,
+    "_unsafe_view": "self",
+}
+
+ARRAYREF_TO_VEC = CodeTemplate(
+    """\
+auto ${vec} = ${arg}.vec();
+"""
+)
+
+OPTIONAL_TO_VAL = CodeTemplate(
+    """\
+auto ${val} = ${arg}.value_or(${default});
+"""
+)
+
+CALL_DISPATCH = CodeTemplate(
+    """\
+at::_ops::${unambiguous_name}::call(${unpacked_args})"""
+)
+
+REVERSE_VIEW_DISPATCH = CodeTemplate(
+    """\
+${reverse_name}(${unpacked_args})"""
+)
+
+MULTI_OUTPUT_VIEW_ITERATION = CodeTemplate(
+    """\
+for (auto ${view_idx} : c10::irange(${var}.size())) {
+  ${body}
+}
+"""
+)
+
+SETUP_REPLAY_VIEW_IF_NOT_SUPPORT_AS_STRIDED_OR_VIEW_WITH_METADATA_CHANGE = CodeTemplate(
+    """\
+std::unique_ptr<torch::autograd::ViewFunc> func(nullptr);
+std::function<at::Tensor(const at::Tensor&)> rev_func=nullptr;
+if (${is_view_with_metadata_change} ||
+    !self.unsafeGetTensorImpl()->support_as_strided() ||
+    self.unsafeGetTensorImpl()->is_python_dispatch() ||
+    c10::AutogradState::get_tls_state().get_view_replay_enabled()) {
+  ${replay_view_func}
+  ${reverse_replay_view_func}
+}
+"""
+)
+
+REPLAY_VIEW_FUNC = CodeTemplate(
+    """\
+func = std::make_unique<${view_func_name}>(${view_func_args});
+"""
+)
+
+REVERSE_REPLAY_VIEW_LAMBDA_FUNC = CodeTemplate(
+    """\
+rev_func = [=](const at::Tensor& ${input_view}) {
+  return ${reverse_replay_view_call};
+};
+"""
+)
+
+METHOD_DEFINITION = CodeTemplate(
+    """\
+${return_type} ${type_wrapper_name}(${formals}) {
+  ${type_definition_body}
+}
+"""
+)
+
+WRAPPER_REGISTRATION = CodeTemplate(
+    """\
+m.impl("${unqual_operator_name_with_overload}",
+       TORCH_FN(${class_type}::${type_wrapper_name})
+);
+"""
+)
+
+AUTOGRAD_NOT_IMPLEMENTED_REGISTRATION = CodeTemplate(
+    """\
+m.impl("${unqual_operator_name_with_overload}", torch::autograd::autogradNotImplementedFallback());
+"""
+)
+
+INPLACE_REDISPATCH = CodeTemplate(
+    """\
+{
+  at::AutoDispatchBelowADInplaceOrView guard;
+  at::_ops::${unambiguous_name}::redispatch(${unpacked_args});
+}
+"""
+)
+
+ASSIGN_RETURN_VALUE = CodeTemplate(
+    """\
+${return_values} = ${rhs_value};
+"""
+)
+
+VIEW_REDISPATCH = CodeTemplate(
+    """\
+${assign_return_values} ([&]() {
+  at::AutoDispatchBelowADInplaceOrView guard;
+  return at::_ops::${unambiguous_name}::redispatch(${unpacked_args});
+})();
+"""
+)
+
+TMP_VAR = "_tmp"
+
+
+# FIXME: Ideally these functions should be methods on Type class, but we have a
+#        comment in codegen/model.py there saying these concepts are not well defined.
+#        Thus we put a version that commonly used by autograd codegen here.
+def is_tensor_type(t: Type) -> bool:
+    # TODO: Should handle optional here?
+    return t.is_tensor_like() and t.is_list_like() is None
+
+
+def is_tensor_list_type(t: Type) -> bool:
+    # TODO: Should handle optional here?
+    return t.is_tensor_like() and t.is_list_like() is not None
+
+
+UNPACK_TENSOR = CodeTemplate(
+    """\
+auto${ref} ${arg_name}_ = unpack${suffix}(${arg_name}, "${arg_name}", ${arg_pos});"""
+)
+
+
+def unpacked_name(arg_name: str) -> str:
+    return arg_name + "_"
+
+
+# e.g. select.int -> select_copy_int_inverse()
+def inverse_view_name(f: NativeFunction) -> str:
+    copy_variant = f"{f.root_name}_copy"
+    overload = f"{f.func.name.overload_name}"
+    if overload != "":
+        overload = "_" + overload
+    return f"{copy_variant}{overload}_inverse"
+
+
+def extract_bindings(f: NativeFunction) -> list[Binding]:
+    return [
+        r
+        for a in f.func.schema_order_arguments()
+        for r in cpp.argument(
+            a,
+            method=False,
+            symint=True,
+            cpp_no_default_args=set(),
+            faithful=False,
+            has_tensor_options=False,
+        )
+    ]
+
+
+@with_native_function
+def unpack_args(f: NativeFunction) -> tuple[list[str], list[Binding]]:
+    body: list[str] = []
+    unpacked_bindings: list[Binding] = []
+
+    for i, binding in enumerate(extract_bindings(f)):
+        assert not isinstance(binding.argument, SelfArgument)
+        if isinstance(binding.argument, TensorOptionsArguments):
+            raise RuntimeError("VariableKernel shouldn't take TensorOptions")
+
+        is_nullable = binding.argument.type.is_nullable()
+        if not binding.argument.type.is_tensor_like() or is_nullable:
+            unpacked_bindings.append(binding)
+            continue
+
+        is_tensor_list = is_tensor_list_type(binding.argument.type)
+        ref = (not is_nullable) and not is_tensor_list
+        suffix = "_opt" if is_nullable and not is_tensor_list else ""
+        body.append(
+            UNPACK_TENSOR.substitute(
+                arg_name=binding.name,
+                arg_pos=i,
+                suffix=suffix,
+                ref="&" if ref else "",
+            )
+        )
+        unpacked_bindings.append(
+            Binding(
+                name=unpacked_name(binding.name),
+                nctype=binding.nctype,
+                argument=binding.argument,
+                default=binding.default,
+            )
+        )
+
+    return body, unpacked_bindings
+
+
+def get_base_name(f: NativeFunction) -> str:
+    return f.func.name.name.base  # TODO: should be str(f.func.name.name)?
+
+
+def get_view_info(f: NativeFunction) -> str | None:
+    base_name = get_base_name(f)
+    view_info = VIEW_FUNCTIONS.get(base_name)
+    if view_info is None and base_name in RETURNS_VIEWS_OF_INPUT:
+        view_info = "self"
+    return view_info
+
+
+def emit_view_func(
+    f: NativeFunction, bindings: list[Binding], view_idx: str | None = None
+) -> str:
+    """Generate an additional lambda function to recover views in backward when as_strided is not supported.
+    See Note [View + Inplace update for base tensor] and [View + Inplace update for view tensor] for more details.
+    """
+    # TODO: Clean this logic up if we get rid of reverse view funcs or reify them.
+    input_base = "input_base"
+    replay_view_func = ""
+    updated_args: list[str] = []
+    known_view_arg_simple_types: list[CType] = [
+        BaseCType(longT),
+        OptionalCType(BaseCType(longT)),
+        BaseCType(SymIntT),
+        OptionalCType(BaseCType(SymIntT)),
+        BaseCType(boolT),
+        BaseCType(intArrayRefT),
+        BaseCType(symIntArrayRefT),
+        ConstRefCType(BaseCType(tensorT)),
+        ConstRefCType(OptionalCType(BaseCType(tensorT))),
+    ]
+    for binding in bindings:
+        arg, arg_type = binding.name, binding.nctype.type
+        if arg == "self":
+            updated_args.append(input_base)
+            continue
+        if arg_type not in known_view_arg_simple_types:
+            known_types_str = ", ".join([str(t) for t in known_view_arg_simple_types])
+            raise TypeError(
+                f"You are adding an {arg_type} {arg} argument to op {cpp.name(f.func)} in addition to known types: "
+                f"{known_types_str}. Please update the list or materialize it so that it can be closed "
+                "over by value, also add a test in pytorch/xla/test/test_operations.py where this code "
+                "is exercised."
+            )
+        if arg_type == BaseCType(intArrayRefT) or arg_type == BaseCType(
+            symIntArrayRefT
+        ):
+            # It's not safe to close over IntArrayRef by value, since this is a
+            # reference type, so materialize a vector to close over by value
+            arg_vec = arg + "_vec"
+            replay_view_func += ARRAYREF_TO_VEC.substitute(arg=arg, vec=arg_vec)
+            updated_args.append(arg_vec)
+        elif arg_type == OptionalCType(BaseCType(longT)):
+            # Materialize int64_t? to int64_t
+            arg_value = arg + "_val"
+            replay_view_func += OPTIONAL_TO_VAL.substitute(
+                arg=arg, val=arg_value, default="0"
+            )
+            updated_args.append(arg_value)
+        elif arg_type == ConstRefCType(BaseCType(tensorT)) or arg_type == ConstRefCType(
+            OptionalCType(BaseCType(tensorT))
+        ):
+            # NB: Closing over a tensor. If a user modifies this tensor, this will be silently
+            # incorrect. The proper thing to do is to store the version counter and copy on write.
+            updated_args.append(arg)
+        else:
+            updated_args.append(arg)
+
+    from .gen_view_funcs import view_func_name
+
+    view_func_args = [b.name for b in bindings if b.name != "self"]
+    if view_idx is not None:
+        view_func_args.append(f"{view_idx}")
+    replay_view_func += REPLAY_VIEW_FUNC.substitute(
+        view_func_name=view_func_name(f, include_namespace=True),
+        view_func_args=view_func_args,
+    )
+
+    input_view = "input_view"
+    reverse_unpacked_args = [
+        "self",
+        f"{input_view}",
+        # inverse_return_mode=
+        "at::functionalization::InverseReturnMode::AlwaysView",
+        *(() if view_idx is None else (f"{view_idx}",)),
+        # skip input_base arg
+        *updated_args[1:],
+    ]
+
+    from torchgen.api.functionalization import reverse_name
+
+    reverse_replay_view_call = REVERSE_VIEW_DISPATCH.substitute(
+        reverse_name=reverse_name(f, include_namespace=True),
+        unpacked_args=reverse_unpacked_args,
+    )
+    reverse_replay_view_func = REVERSE_REPLAY_VIEW_LAMBDA_FUNC.substitute(
+        input_view=input_view, reverse_replay_view_call=reverse_replay_view_call
+    )
+
+    is_view_with_metadata_change = (
+        "true" if cpp.name(f.func) in VIEW_FUNCTIONS_WITH_METADATA_CHANGE else "false"
+    )
+
+    return SETUP_REPLAY_VIEW_IF_NOT_SUPPORT_AS_STRIDED_OR_VIEW_WITH_METADATA_CHANGE.substitute(
+        is_view_with_metadata_change=is_view_with_metadata_change,
+        replay_view_func=replay_view_func,
+        reverse_replay_view_func=reverse_replay_view_func,
+    )
+
+
+def emit_view_body(
+    fn: NativeFunctionWithDifferentiabilityInfo, var: str
+) -> tuple[str, str]:
+    # See NOTE [ Autograd View Variables ] in variable.h for details.
+    f = fn.func
+    base_name = get_base_name(f)
+    view_info = get_view_info(f)
+    call = ""
+    differentiable_outputs = gen_differentiable_outputs(fn)
+    differentiable_output_vars = {r.name for r in differentiable_outputs}
+    if not isinstance(view_info, str):
+        raise TypeError(
+            f"The view info should be a string for {base_name}, but it is: {view_info}"
+        )
+    if len(differentiable_output_vars) == 0:
+        # no output is differentiable (.indices() for SparseTensors for example)
+        rhs_value = (
+            f"as_view({view_info}, {var}, "
+            f"/* is_bw_differentiable */ false, /* is_fw_differentiable */ false)"
+        )
+    elif len(differentiable_output_vars) == 1:
+        # Single differentiable output (Tensor or Tensor[])
+        return_info = differentiable_outputs[0]
+        # We only support simple Tensor or a TensorList for functions that return views
+        if not is_tensor_type(return_info.type) and not is_tensor_list_type(
+            return_info.type
+        ):
+            raise RuntimeError(
+                f"{base_name} that return differentiable views can only return Tensor or Tensor[]"
+            )
+
+        # See Note [ View + Inplace detection]
+        def get_creation_meta_in_mode(original: str) -> str:
+            creation_meta_with_grad_mode = f"(at::GradMode::is_enabled() ? {original} : CreationMeta::NO_GRAD_MODE)"
+            return f"InferenceMode::is_enabled() ? CreationMeta::INFERENCE_MODE : {creation_meta_with_grad_mode}"
+
+        # Only allow rebasing of the history if we return a single Tensor
+        # If we are in a no grad block, raise a warning
+        # See NOTE [ View + Inplace detection ] for more details about this logic
+        if is_tensor_list_type(return_info.type):
+            creation_meta = get_creation_meta_in_mode("CreationMeta::MULTI_OUTPUT_NODE")
+            view_idx = "view_idx"
+            view_func = emit_view_func(
+                f, extract_bindings(f), view_idx=view_idx
+            ).strip()
+            as_view_call = (
+                f"as_view(/* base */ {view_info}, /* output */ {var}[{view_idx}], "
+                "/* is_bw_differentiable */ true, /* is_fw_differentiable */ true, "
+                "/* view_func */ std::move(func), /* rev_view_func */ rev_func, "
+                f"/* creation_meta */ {creation_meta});"
+            )
+            call += MULTI_OUTPUT_VIEW_ITERATION.substitute(
+                var=var, view_idx=view_idx, body=f"{view_func}\n{as_view_call}"
+            )
+            rhs_value = f"std::move({var})"
+        else:
+            call += emit_view_func(f, extract_bindings(f), view_idx=None)
+            creation_meta = get_creation_meta_in_mode("CreationMeta::DEFAULT")
+            rhs_value = (
+                f"as_view(/* base */ {view_info}, /* output */ {var}, /* is_bw_differentiable */ true, "
+                "/* is_fw_differentiable */ true, "
+                f"/* view_func */ std::move(func), /* rev_view_func */ rev_func, /* creation_meta */ {creation_meta})"
+            )
+    else:
+        # This could be supported but we don't need it at the moment, so keeping things simple.
+        raise RuntimeError(
+            "Function that return multiple differentiable output "
+            "when at least one of them is view is not supported."
+        )
+    return call, rhs_value
+
+
+def modifies_arguments(f: NativeFunction) -> bool:
+    return f.func.kind() in [SchemaKind.inplace, SchemaKind.out]
+
+
+@with_native_function_with_differentiability_info
+def emit_inplace_or_view_body(fn: NativeFunctionWithDifferentiabilityInfo) -> list[str]:
+    f = fn.func
+    inplace_view_body: list[str] = []
+
+    dispatcher_sig = DispatcherSignature.from_schema(f.func)
+    dispatcher_exprs = dispatcher_sig.exprs()
+
+    # code-generated ADInplaceOrView kernels plumb and recompute dispatch keys directly through the kernel for performance.
+    # See Note [Plumbing Keys Through The Dispatcher] for details.
+    dispatch_key_set = "ks & c10::after_ADInplaceOrView_keyset"
+    redispatch_args = ", ".join([dispatch_key_set] + [a.expr for a in dispatcher_exprs])
+
+    # Note that this calls the slow, dispatching variants of manual_cpp_binding ops.
+    # We could probably work harder to ensure that the fast variants are called instead, but the perf benefit would be minimal.
+    if modifies_arguments(f):  # inplace op
+        inplace_view_body.append(
+            INPLACE_REDISPATCH.substitute(
+                unambiguous_name=f.func.name.unambiguous_name(),
+                unpacked_args=redispatch_args,
+            )
+        )
+        for r in cpp.return_names(f):
+            inplace_view_body.append(f"increment_version({r});")
+    else:
+        assert get_view_info(f) is not None
+        inplace_view_body.append(
+            VIEW_REDISPATCH.substitute(
+                assign_return_values="auto " + TMP_VAR + " = ",
+                unambiguous_name=f.func.name.unambiguous_name(),
+                unpacked_args=redispatch_args,
+            )
+        )
+        call, rhs_value = emit_view_body(fn, TMP_VAR)
+        inplace_view_body.append(call)
+        assert rhs_value is not None
+        inplace_view_body.append(
+            ASSIGN_RETURN_VALUE.substitute(
+                return_values=tie_return_values(f), rhs_value=rhs_value
+            )
+        )
+    if f.func.returns:
+        inplace_view_body.append(f"return {get_return_value(f)};")
+    return inplace_view_body
+
+
+@with_native_function
+def gen_formals(f: NativeFunction) -> str:
+    return ", ".join(
+        # code-generated autograd kernels plumb and recompute dispatch keys directly through the kernel for performance.
+        # See Note [Plumbing Keys Through The Dispatcher] for details.
+        ["c10::DispatchKeySet ks"]
+        + [
+            f"{cpp.argument_type(a, binds='__placeholder__', symint=True).cpp_type()} {a.name}"
+            for a in f.func.schema_order_arguments()
+        ]
+    )
+
+
+@with_native_function_with_differentiability_info
+def inplace_or_view_method_definition(
+    fn: NativeFunctionWithDifferentiabilityInfo,
+) -> str | None:
+    f = fn.func
+    if get_view_info(f) is None and (
+        # For functions that modify their inputs but don't return them,
+        # we can't give them autograd support.
+        # See https://github.com/pytorch/pytorch/issues/53796
+        not modifies_arguments(f) or len(f.func.returns) == 0
+    ):
+        return None
+    return METHOD_DEFINITION.substitute(
+        return_type=cpp.returns_type(f.func.returns, symint=True).cpp_type(),
+        type_wrapper_name=type_wrapper_name(f),
+        formals=gen_formals(f),
+        type_definition_body=emit_inplace_or_view_body(fn),
+    )
+
+
+@with_native_function_with_differentiability_info
+def inplace_or_view_method_registration(
+    fn: NativeFunctionWithDifferentiabilityInfo,
+) -> str | None:
+    f = fn.func
+    if get_view_info(f) is None and (
+        not modifies_arguments(f) or len(f.func.returns) == 0
+    ):
+        return None
+    return WRAPPER_REGISTRATION.substitute(
+        unqual_operator_name_with_overload=f.func.name,
+        type_wrapper_name=type_wrapper_name(f),
+        class_type="ADInplaceOrView",
+    )
+
+
+def use_derived(fn: NativeFunctionWithDifferentiabilityInfo) -> bool:
+    f = fn.func
+    name = cpp.name(f.func)
+    return name not in MANUAL_AUTOGRAD and dispatch_strategy(fn) == "use_derived"
+
+
+def gen_inplace_or_view_type_env(
+    fn: NativeFunctionWithDifferentiabilityInfo,
+) -> dict[str, list[str]]:
+    definition = inplace_or_view_method_definition(fn)
+    registration = inplace_or_view_method_registration(fn)
+
+    return {
+        "ops_headers": (
+            [f"#include <ATen/ops/{fn.func.root_name}_ops.h>"]
+            if definition is not None
+            else []
+        ),
+        "inplace_or_view_method_definitions": [definition]
+        if definition is not None
+        else [],
+        "inplace_or_view_wrapper_registrations": [registration]
+        if registration is not None
+        else [],
+    }
+
+
+def gen_inplace_or_view_type(
+    out: str,
+    native_yaml_path: str,
+    tags_yaml_path: str,
+    fns_with_infos: list[NativeFunctionWithDifferentiabilityInfo],
+    template_path: str,
+) -> None:
+    # NOTE: see Note [Sharded File] at the top of the VariableType.cpp
+    # template regarding sharding of the generated files.
+
+    fm = FileManager(install_dir=out, template_dir=template_path, dry_run=False)
+    fm.write_sharded(
+        "ADInplaceOrViewType.cpp",
+        [fn for fn in fns_with_infos if use_derived(fn)],
+        key_fn=lambda fn: fn.func.root_name,
+        base_env={
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/ADInplaceOrViewType.cpp",
+        },
+        env_callable=gen_inplace_or_view_type_env,
+        num_shards=2,
+        sharded_keys={
+            "ops_headers",
+            "inplace_or_view_method_definitions",
+            "inplace_or_view_wrapper_registrations",
+        },
+    )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_python_functions.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_python_functions.py
new file mode 100644
index 0000000000000000000000000000000000000000..af25d55ef38d87fc0d9398437f116f234634932d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_python_functions.py
@@ -0,0 +1,1405 @@
+# Generates Python bindings for ATen functions
+#
+# The bindings are generated as methods on python_variable or functions on the
+# torch._C._nn. torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._sparse
+# or torch._C._special objects.
+#
+
+# Code tries to stick to the following rules:
+#
+# - templates should be colocated with the functions that use them.
+#   no templates are currently shared between functions, but if that
+#   happens, maybe put the template with the first one
+#
+# - don't use environment dictionaries when calling template.substitute().
+#   pass named arguments directly for everything, otherwise it's much too
+#   hard to track what's actually being used and by who
+#
+# - colocate any new hacks/adjustments with existing ones of the same kind.
+#   ideally in a data structure rather than code if possible. See e.g.
+#   SCHEMA_DEFAULT_CONVERSION_HACKS, etc.
+#
+# - similarly, conversions from one format to another should ideally happen
+#   all at once in a single place.
+#
+# - no nontrivial nested functions. couple-liners are ok but please no more.
+#   especially avoid functions that read/write outer variables defined far away.
+#
+# - raise RuntimeError instead of asserting, and put as much
+#   information as is available into the message. I.e. no need to
+#   plumb in new params whose only purpose is to fill out an error
+#   message, but use what's there
+#
+
+from __future__ import annotations
+
+import itertools
+import re
+from collections import defaultdict
+from typing import TYPE_CHECKING
+
+import yaml
+
+from torchgen.api import cpp
+from torchgen.api.python import (
+    arg_parser_output_exprs,
+    cpp_dispatch_exprs,
+    cpp_dispatch_target,
+    dispatch_lambda_args,
+    dispatch_lambda_exprs,
+    dispatch_lambda_return_str,
+    has_tensor_options,
+    PythonSignature,
+    PythonSignatureDeprecated,
+    PythonSignatureGroup,
+    PythonSignatureNativeFunctionPair,
+    signature,
+    signature_from_schema,
+    structseq_fieldnames,
+)
+from torchgen.code_template import CodeTemplate
+from torchgen.context import with_native_function
+from torchgen.gen import cpp_string, parse_native_yaml, parse_tags_yaml
+from torchgen.model import (
+    Argument,
+    BaseOperatorName,
+    FunctionSchema,
+    NativeFunction,
+    SchemaKind,
+    Type,
+    Variant,
+)
+from torchgen.utils import FileManager, split_name_params
+from torchgen.yaml_utils import YamlLoader
+
+from .gen_inplace_or_view_type import is_tensor_list_type
+from .gen_trace_type import should_trace
+
+
+if TYPE_CHECKING:
+    from collections.abc import Callable, Iterable, Sequence
+
+
+#
+# declarations blocklist
+# We skip codegen for these functions, for various reasons.
+# Future PRs will categorize this list and eliminate or hoist
+# them out of eager-only codegen.
+# See https://github.com/pytorch/pytorch/issues/30788
+#
+
+# These functions require manual Python bindings or are not exposed to Python
+_SKIP_PYTHON_BINDINGS = [
+    "alias",
+    "contiguous",
+    "is_cuda",
+    "is_sparse",
+    "is_sparse_csr",
+    "size",
+    "stride",
+    "sym_is_contiguous",
+    "sym_size",
+    "sym_stride",
+    "sym_storage_offset",
+    "sym_numel",
+    ".*_backward",
+    ".*_backward_(out|input|weight|bias)",
+    ".*_forward",
+    ".*_forward_out",
+    ".*_jvp",
+    "_unsafe_view",
+    "tensor",
+    "_?sparse_(coo|compressed|csr|csc|bsr|bsc)_tensor.*",
+    "_range.*",
+    "_sparse_add_out",
+    "_sparse_div.*",
+    "_sparse_mul.*",
+    "_sparse_sub.*",
+    "_sparse_dense_add_out",
+    "index",
+    "index_out",
+    "unique_dim_consecutive",
+    "_cumsum.*",
+    "_cumprod.*",
+    "_sum.*",
+    "_prod.*",
+    "_th_.*",
+    "_thnn_.*",
+    "range.*",
+    "_solve.*",
+    "_inverse.*",
+    "_cholesky.*",
+    "_triangular_solve.*",
+    "_qr.*",
+    "_svd.*",
+    "slice",
+    "item",
+    "_local_scalar_dense",
+    "to",
+    "_to_copy",
+    "_to_copy_out",
+    "_reshape_copy",
+    "_reshape_copy_out",
+    "copy_sparse_to_sparse_",
+    "copy_",
+    "_foreach_copy",
+    "numpy_T",
+    "matrix_H",
+    "mT",
+    "mH",  # these need to be an attributes in Python, not functions
+    "nonzero(_(out|numpy))?",
+    "set_data",
+    ".*_overrideable",  # overridable functions for backend extension
+    "data",
+    "is_leaf",
+    "output_nr",
+    "_version",
+    "requires_grad_",
+    "retains_grad",
+    "set_",
+    "_fw_primal",
+    "fake_quantize_per_tensor_affine_cachemask",
+    "fake_quantize_per_channel_affine_cachemask",
+    "_new_zeros_with_same_feature_meta",
+    "_has_same_storage_numel",  # used for forward AD internals
+    "_reshape_alias",
+    "replace_",  # only used by the functionalization pass, doesn't need to be exposed to python
+    "copy",  # only used by the functionalization pass
+    "fill.Tensor",  # only used by the functionalization pass
+    "fill.Scalar",  # only used by the functionalization pass
+    "lift.*",
+    "normal_functional",  # only used by the functionalization pass
+    "nbytes",
+    "itemsize",
+    "_batch_norm_with_update",
+    "_batch_norm_with_update_out",
+    "_batch_norm_no_update",
+]
+
+SKIP_PYTHON_BINDINGS = [
+    re.compile(rf"^{pattern}$") for pattern in _SKIP_PYTHON_BINDINGS
+]
+
+# These function signatures are not exposed to Python. Note that this signature
+# list does not support regex.
+SKIP_PYTHON_BINDINGS_SIGNATURES = [
+    "add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor",
+    "add_.Scalar(Tensor(a!) self, Scalar other, Scalar alpha=1) -> Tensor(a!)",
+    "sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor",
+    "sub_.Scalar(Tensor(a!) self, Scalar other, Scalar alpha=1) -> Tensor(a!)",
+    "mul.Scalar(Tensor self, Scalar other) -> Tensor",
+    "mul_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)",
+    "div.Scalar(Tensor self, Scalar other) -> Tensor",
+    "div_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)",
+]
+
+
+@with_native_function
+def should_generate_py_binding(f: NativeFunction) -> bool:
+    # NativeFunctions that are entirely code-generated should not get python bindings
+    # because these codegen implementations are often inefficient. A handful of
+    # view_copy style ops were exposed accidentally when they were handwritten and now
+    # that we are moving them to codegen for bc reasons we need to keep them exposed in
+    # python.
+    if "generated" in f.tags and "view_copy" not in f.tags:
+        return False
+
+    name = cpp.name(f.func)
+    for skip_regex in SKIP_PYTHON_BINDINGS:
+        if skip_regex.match(name):
+            return False
+
+    signature = str(f.func)
+    for pattern in SKIP_PYTHON_BINDINGS_SIGNATURES:
+        if pattern == signature:
+            return False
+    return True
+
+
+def get_pycname(name: BaseOperatorName) -> str:
+    return f"THPVariable_{name}"
+
+
+def is_noarg(overloads: Sequence[PythonSignatureNativeFunctionPair]) -> bool:
+    return len(overloads) == 1 and overloads[0].signature.arguments_count() == 0
+
+
+def is_py_variable_method(f: NativeFunction) -> bool:
+    return f.python_module is None and Variant.method in f.variants
+
+
+def is_py_torch_function(f: NativeFunction) -> bool:
+    return f.python_module is None and Variant.function in f.variants
+
+
+def is_py_nn_function(f: NativeFunction) -> bool:
+    return f.python_module == "nn"
+
+
+def is_py_fft_function(f: NativeFunction) -> bool:
+    return f.python_module == "fft"
+
+
+def is_py_linalg_function(f: NativeFunction) -> bool:
+    return f.python_module == "linalg"
+
+
+def is_py_nested_function(f: NativeFunction) -> bool:
+    return f.python_module == "nested"
+
+
+def is_py_sparse_function(f: NativeFunction) -> bool:
+    return f.python_module == "sparse"
+
+
+def is_py_special_function(f: NativeFunction) -> bool:
+    return f.python_module == "special"
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                            Main Function
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+def gen(
+    out: str,
+    native_yaml_path: str,
+    tags_yaml_path: str,
+    deprecated_yaml_path: str,
+    template_path: str,
+    *,
+    symint: bool = True,
+) -> None:
+    fm = FileManager(install_dir=out, template_dir=template_path, dry_run=False)
+    native_functions = parse_native_yaml(
+        native_yaml_path, tags_yaml_path
+    ).native_functions
+    native_functions = list(filter(should_generate_py_binding, native_functions))
+
+    methods = load_signatures(native_functions, deprecated_yaml_path, method=True)
+    create_python_bindings(
+        fm,
+        methods,
+        is_py_variable_method,
+        None,
+        "python_variable_methods.cpp",
+        method=True,
+        symint=symint,
+    )
+
+    # NOTE: num_shards here must be synced with gatherTorchFunctions in
+    #       torch/csrc/autograd/python_torch_functions_manual.cpp
+    functions = load_signatures(native_functions, deprecated_yaml_path, method=False)
+    create_python_bindings_sharded(
+        fm,
+        functions,
+        is_py_torch_function,
+        "torch",
+        "python_torch_functions.cpp",
+        method=False,
+        num_shards=3,
+        symint=symint,
+    )
+
+    create_python_bindings(
+        fm,
+        functions,
+        is_py_nn_function,
+        "torch.nn",
+        "python_nn_functions.cpp",
+        method=False,
+        symint=symint,
+    )
+
+    create_python_bindings(
+        fm,
+        functions,
+        is_py_fft_function,
+        "torch.fft",
+        "python_fft_functions.cpp",
+        method=False,
+        symint=symint,
+    )
+
+    create_python_bindings(
+        fm,
+        functions,
+        is_py_linalg_function,
+        "torch.linalg",
+        "python_linalg_functions.cpp",
+        method=False,
+        symint=symint,
+    )
+
+    create_python_bindings(
+        fm,
+        functions,
+        is_py_nested_function,
+        "torch.nested",
+        "python_nested_functions.cpp",
+        method=False,
+    )
+
+    create_python_bindings(
+        fm,
+        functions,
+        is_py_sparse_function,
+        "torch.sparse",
+        "python_sparse_functions.cpp",
+        method=False,
+        symint=symint,
+    )
+
+    create_python_bindings(
+        fm,
+        functions,
+        is_py_special_function,
+        "torch.special",
+        "python_special_functions.cpp",
+        method=False,
+        symint=symint,
+    )
+
+    # Currently, we only use `functions` to generate `return_types` bindings.
+    # All methods which return structseq have function variant at this point.
+    # If any method only operator with structseq is added in the future,
+    # we will have to address that.
+    create_python_return_type_bindings(
+        fm, functions, lambda fn: True, "python_return_types.cpp"
+    )
+    create_python_return_type_bindings_header(
+        fm, functions, lambda fn: True, "python_return_types.h"
+    )
+
+    valid_tags = parse_tags_yaml(tags_yaml_path)
+
+    def gen_tags_enum() -> dict[str, str]:
+        return {
+            "enum_of_valid_tags": (
+                "".join(
+                    [f'\n.value("{tag}", at::Tag::{tag})' for tag in sorted(valid_tags)]
+                )
+            )
+        }
+
+    fm.write("python_enum_tag.cpp", gen_tags_enum)
+
+
+def group_filter_overloads(
+    pairs: Sequence[PythonSignatureNativeFunctionPair],
+    pred: Callable[[NativeFunction], bool],
+) -> dict[BaseOperatorName, list[PythonSignatureNativeFunctionPair]]:
+    grouped: dict[BaseOperatorName, list[PythonSignatureNativeFunctionPair]] = (
+        defaultdict(list)
+    )
+    for pair in pairs:
+        if pred(pair.function):
+            grouped[pair.function.func.name.name].append(pair)
+    return grouped
+
+
+def create_python_bindings(
+    fm: FileManager,
+    pairs: Sequence[PythonSignatureNativeFunctionPair],
+    pred: Callable[[NativeFunction], bool],
+    module: str | None,
+    filename: str,
+    *,
+    method: bool,
+    symint: bool = True,
+) -> None:
+    """Generates Python bindings to ATen functions"""
+    py_methods: list[str] = []
+    ops_headers: list[str] = []
+    py_method_defs: list[str] = []
+    py_forwards: list[str] = []
+
+    grouped = group_filter_overloads(pairs, pred)
+
+    for name in sorted(grouped.keys(), key=str):
+        overloads = grouped[name]
+        py_methods.append(
+            method_impl(name, module, overloads, method=method, symint=symint)
+        )
+        py_method_defs.append(method_def(name, module, overloads, method=method))
+        py_forwards.extend(forward_decls(name, overloads, method=method))
+        ops_headers.append(f"#include <ATen/ops/{name.base}.h>")
+
+    fm.write_with_template(
+        filename,
+        filename,
+        lambda: {
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/{filename}",
+            "ops_headers": ops_headers,
+            "py_forwards": py_forwards,
+            "py_methods": py_methods,
+            "py_method_defs": py_method_defs,
+        },
+    )
+
+
+def create_python_return_type_bindings(
+    fm: FileManager,
+    pairs: Sequence[PythonSignatureNativeFunctionPair],
+    pred: Callable[[NativeFunction], bool],
+    filename: str,
+) -> None:
+    """
+    Generate function to initialize and return named tuple for native functions
+    which returns named tuple and registration invocations in `python_return_types.cpp`.
+    """
+    py_return_types_definition: list[str] = []
+    py_return_types_registrations: list[str] = []
+
+    grouped = group_filter_overloads(pairs, pred)
+
+    for name in sorted(grouped.keys(), key=str):
+        overloads = grouped[name]
+        definitions, registrations = generate_return_type_definition_and_registrations(
+            overloads
+        )
+        py_return_types_definition.append(
+            "" if not definitions else "\n".join(definitions)
+        )
+        py_return_types_registrations.append(
+            "" if not registrations else "\n".join(registrations)
+        )
+
+    fm.write_with_template(
+        filename,
+        filename,
+        lambda: {
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/{filename}",
+            "py_return_types": py_return_types_definition,
+            "py_return_types_registrations": py_return_types_registrations,
+        },
+    )
+
+
+def create_python_return_type_bindings_header(
+    fm: FileManager,
+    pairs: Sequence[PythonSignatureNativeFunctionPair],
+    pred: Callable[[NativeFunction], bool],
+    filename: str,
+) -> None:
+    """
+    Generate function to initialize and return named tuple for native functions
+    which returns named tuple and relevant entry for the map in `python_return_types.cpp`.
+    """
+    py_return_types_declarations: list[str] = []
+
+    grouped = group_filter_overloads(pairs, pred)
+
+    for name in sorted(grouped.keys(), key=str):
+        overloads = grouped[name]
+        declarations = generate_return_type_declarations(overloads)
+        py_return_types_declarations.append(
+            "" if not declarations else "\n".join(declarations)
+        )
+
+    fm.write_with_template(
+        filename,
+        filename,
+        lambda: {
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/{filename}",
+            "py_return_types_declarations": py_return_types_declarations,
+        },
+    )
+
+
+def create_python_bindings_sharded(
+    fm: FileManager,
+    pairs: Sequence[PythonSignatureNativeFunctionPair],
+    pred: Callable[[NativeFunction], bool],
+    module: str | None,
+    filename: str,
+    *,
+    method: bool,
+    num_shards: int,
+    symint: bool = True,
+) -> None:
+    """Generates Python bindings to ATen functions"""
+    grouped = group_filter_overloads(pairs, pred)
+
+    def key_func(
+        kv: tuple[BaseOperatorName, list[PythonSignatureNativeFunctionPair]],
+    ) -> str:
+        return kv[0].base
+
+    def env_func(
+        kv: tuple[BaseOperatorName, list[PythonSignatureNativeFunctionPair]],
+    ) -> dict[str, list[str]]:
+        name, fn_pairs = kv
+        return {
+            "ops_headers": [f"#include <ATen/ops/{name.base}.h>"],
+            "py_forwards": list(forward_decls(name, fn_pairs, method=method)),
+            "py_methods": [
+                method_impl(name, module, fn_pairs, method=method, symint=symint)
+            ],
+            "py_method_defs": [method_def(name, module, fn_pairs, method=method)],
+        }
+
+    fm.write_sharded(
+        filename,
+        grouped.items(),
+        base_env={
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/{filename}",
+        },
+        key_fn=key_func,
+        env_callable=env_func,
+        num_shards=num_shards,
+        sharded_keys={"ops_headers", "py_forwards", "py_methods", "py_method_defs"},
+    )
+
+
+def load_signatures(
+    native_functions: list[NativeFunction],
+    deprecated_yaml_path: str,
+    *,
+    method: bool,
+    skip_deprecated: bool = False,
+    pyi: bool = False,
+) -> Sequence[PythonSignatureNativeFunctionPair]:
+    @with_native_function
+    def gen_signature_pairs(f: NativeFunction) -> PythonSignatureNativeFunctionPair:
+        return PythonSignatureNativeFunctionPair(
+            signature=signature(f, method=method, pyi=pyi),
+            function=f,
+        )
+
+    pairs = list(map(gen_signature_pairs, native_functions))
+    deprecated = load_deprecated_signatures(
+        pairs, deprecated_yaml_path, method=method, pyi=pyi
+    )
+    return pairs if skip_deprecated else pairs + deprecated
+
+
+def load_deprecated_signatures(
+    pairs: Sequence[PythonSignatureNativeFunctionPair],
+    deprecated_yaml_path: str,
+    *,
+    method: bool,
+    pyi: bool,
+) -> list[PythonSignatureNativeFunctionPair]:
+    # The deprecated.yaml doesn't have complete type information, we need
+    # find and leverage the original ATen signature (to which it delegates
+    # the call) to generate the full python signature.
+    # We join the deprecated and the original signatures using type-only form.
+
+    # group the original ATen signatures by name
+    grouped: dict[str, list[PythonSignatureNativeFunctionPair]] = defaultdict(list)
+    for pair in pairs:
+        grouped[pair.signature.name].append(pair)
+
+    # find matching original signatures for each deprecated signature
+    results: list[PythonSignatureNativeFunctionPair] = []
+
+    with open(deprecated_yaml_path) as f:
+        deprecated_defs = yaml.load(f, Loader=YamlLoader)
+
+    for deprecated in deprecated_defs:
+        schema = FunctionSchema.parse(deprecated["name"])
+        aten_name, call_args = split_name_params(deprecated["aten"])
+        is_out = aten_name.endswith("_out")
+        if is_out:
+            aten_name = aten_name.replace("_out", "")
+
+        # HACK: these are fixed constants used to pass the aten function.
+        # The type must be known ahead of time
+        known_constants = {
+            "1": Type.parse("Scalar"),
+        }
+        schema_args_by_name = {a.name: a for a in schema.arguments.flat_all}
+        for name in call_args:
+            assert name in schema_args_by_name or name in known_constants, (
+                f"deprecation definition: Unrecognized value {name}"
+            )
+
+        # Map deprecated signature arguments to their aten signature and test
+        # if the types and alias annotation match.
+        def is_schema_compatible(
+            aten_schema: FunctionSchema,
+        ) -> bool:
+            arguments: Iterable[Argument]
+            if is_out:
+                arguments = itertools.chain(
+                    aten_schema.arguments.out, aten_schema.arguments.flat_non_out
+                )
+            else:
+                arguments = aten_schema.arguments.flat_all
+
+            for i, arg in enumerate(arguments):
+                if i < len(call_args):
+                    arg_name = call_args[i]
+                    if arg_name in known_constants:
+                        schema_type = known_constants[arg_name]
+                        schema_annotation = None
+                    else:
+                        schema_arg = schema_args_by_name[arg_name]
+                        schema_type = schema_arg.type
+                        schema_annotation = schema_arg.annotation
+
+                    if schema_type != arg.type or schema_annotation != arg.annotation:
+                        return False
+                else:
+                    if arg.default is None:
+                        return False
+
+            return len(schema.returns) == len(aten_schema.returns) and all(
+                a == b for a, b in zip(schema.returns, aten_schema.returns)
+            )
+
+        any_schema_found = False
+        for pair in grouped[aten_name]:
+            if not is_schema_compatible(pair.function.func):
+                continue
+            any_schema_found = True
+
+            python_sig = signature_from_schema(
+                schema,
+                category_override=pair.function.category_override,
+                method=method,
+                pyi=pyi,
+            )
+
+            results.append(
+                PythonSignatureNativeFunctionPair(
+                    signature=PythonSignatureDeprecated(
+                        name=python_sig.name,
+                        input_args=python_sig.input_args,
+                        input_kwargs=python_sig.input_kwargs,
+                        output_args=python_sig.output_args,
+                        tensor_options_args=python_sig.tensor_options_args,
+                        method=python_sig.method,
+                        deprecated_schema=schema,
+                        deprecated_args_exprs=tuple(call_args),
+                        returns=python_sig.returns,
+                    ),
+                    function=pair.function,
+                )
+            )
+        assert any_schema_found, (
+            f"No native function with name {aten_name} matched signature:\n  {str(schema)}"
+        )
+
+    return results
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                         Named Tuple Codegen
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+@with_native_function
+def gen_structseq_typename_key(f: NativeFunction) -> str:
+    name = cpp.name(f.func)
+    fieldnames = structseq_fieldnames(f.func.returns)
+    return "_".join([name] + fieldnames)
+
+
+def emit_structseq_call(
+    overloads: Sequence[PythonSignatureNativeFunctionPair],
+) -> tuple[list[str], dict[str, str]]:
+    """
+    Generate block of named tuple type def inits, and add typeref snippets
+    to declarations that use them
+    """
+    typenames: dict[
+        str, str
+    ] = {}  # map from unique name + field name lists to typedef name
+    typedefs: list[str] = []  # typedef declarations and init code
+
+    for overload in overloads:
+        fieldnames = structseq_fieldnames(overload.function.func.returns)
+        if not fieldnames:
+            continue
+
+        name = cpp.name(overload.function.func)  # use @with_native_function?
+        tn_key = gen_structseq_typename_key(overload.function)
+        typename = typenames.get(tn_key)
+        if typename is None:
+            typename = f"NamedTuple{'' if not typedefs else len(typedefs)}"
+            typenames[tn_key] = typename
+            typedefs.append(
+                f"""\
+static PyTypeObject* {typename} = generated::get_{name}_structseq();"""
+            )
+
+    return typedefs, typenames
+
+
+def generate_return_type_definition_and_registrations(
+    overloads: Sequence[PythonSignatureNativeFunctionPair],
+) -> tuple[list[str], list[str]]:
+    """
+    Generate block of function in `python_return_types.cpp` to initialize
+    and return named tuple for a native function which returns named tuple
+    and registration invocations in same file.
+    """
+    typenames: dict[
+        str, str
+    ] = {}  # map from unique name + field name lists to typedef name
+    definitions: list[str] = []  # function definition to register the typedef
+    registrations: list[str] = []  # register call for the typedef
+
+    for overload in overloads:
+        fieldnames = structseq_fieldnames(overload.function.func.returns)
+        if not fieldnames:
+            continue
+
+        fields = ", ".join(f'{{"{fn}", ""}}' for fn in fieldnames)
+
+        name = cpp.name(overload.function.func)  # use @with_native_function?
+        tn_key = gen_structseq_typename_key(overload.function)
+        typename = typenames.get(tn_key)
+
+        if typename is None:
+            typename = f"{name}NamedTuple{'' if not definitions else len(definitions)}"
+            typenames[tn_key] = typename
+            definitions.append(
+                f"""\
+PyTypeObject* get_{name}_structseq() {{
+    static PyStructSequence_Field NamedTuple_fields[] = {{ {fields},  {{nullptr}} }};
+    static PyTypeObject {typename};
+    static bool is_initialized = false;
+    static PyStructSequence_Desc desc = {{ "torch.return_types.{name}", nullptr, NamedTuple_fields, {len(fieldnames)} }};
+    if (!is_initialized) {{
+        PyStructSequence_InitType(&{typename}, &desc);
+        {typename}.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
+        is_initialized = true;
+    }}
+    return &{typename};
+}}
+"""
+            )
+            registrations.append(
+                f'addReturnType(return_types_module, "{name}", generated::get_{name}_structseq());'
+            )
+
+    return definitions, registrations
+
+
+def generate_return_type_declarations(
+    overloads: Sequence[PythonSignatureNativeFunctionPair],
+) -> list[str]:
+    """
+    Generate block of function declarations in `python_return_types.h` to initialize
+    and return named tuple for a native function.
+    """
+    typenames: dict[
+        str, str
+    ] = {}  # map from unique name + field name lists to typedef name
+    declarations: list[str] = []  # function declaration to register the typedef
+
+    for overload in overloads:
+        fieldnames = structseq_fieldnames(overload.function.func.returns)
+        if not fieldnames:
+            continue
+
+        name = cpp.name(overload.function.func)  # use @with_native_function?
+        tn_key = gen_structseq_typename_key(overload.function)
+        typename = typenames.get(tn_key)
+
+        if typename is None:
+            typename = (
+                f"{name}NamedTuple{'' if not declarations else len(declarations)}"
+            )
+            typenames[tn_key] = typename
+            declarations.append(f"PyTypeObject* get_{name}_structseq();")
+
+    return declarations
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                         Method Impl Codegen
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+# python binding for all overloads of a particular function/method
+PY_VARIABLE_METHOD_VARARGS = CodeTemplate(
+    r"""\
+// ${name}
+static PyObject * ${pycname}(PyObject* self_, PyObject* args, PyObject* kwargs)
+{
+  ${method_header}
+  static PythonArgParser parser({
+    ${signatures}
+  }, /*traceable=*/${traceable});
+
+  ParsedArgs<${max_args}> parsed_args;
+  auto _r = parser.parse(${self_}, args, kwargs, parsed_args);
+  ${check_has_torch_function}
+  switch (_r.idx) {
+    ${dispatch}
+  }
+  ${method_footer}
+}
+
+"""
+)
+
+# handler for a single parsed signature - may be a single overload or
+# a pair of overloads that whose signatures only differ in output params
+# (plugged into PY_VARIABLE_METHOD_VARARGS as an item in ${dispatch})
+PY_VARIABLE_CASE = CodeTemplate(
+    """\
+case ${overload_index}: {
+  ${body}
+}
+"""
+)
+
+# python binding for single-overload function/method
+PY_VARIABLE_METHOD_VARARGS_SINGLETON = CodeTemplate(
+    """\
+// ${name}
+static PyObject * ${pycname}(PyObject* self_, PyObject* args, PyObject* kwargs)
+{
+  ${method_header}
+  static PythonArgParser parser({
+    ${signatures}
+  }, /*traceable=*/${traceable});
+
+  ParsedArgs<${max_args}> parsed_args;
+  auto _r = parser.parse(${self_}, args, kwargs, parsed_args);
+  ${check_has_torch_function}
+  ${dispatch}
+  ${method_footer}
+}
+
+"""
+)
+
+# python binding for a method with no args, shortcuts parsing
+PY_VARIABLE_METHOD_NOARGS = CodeTemplate(
+    """\
+// ${name}
+static PyObject * ${pycname}(PyObject* self_, PyObject* args)
+{
+  ${method_header}
+  ${check_has_torch_function}
+  ${dispatch}
+  ${method_footer}
+}
+
+"""
+)
+
+
+def method_impl(
+    name: BaseOperatorName,
+    module: str | None,
+    overloads: Sequence[PythonSignatureNativeFunctionPair],
+    *,
+    method: bool,
+    symint: bool = True,
+) -> str:
+    """
+    Generate a python binding for all overloads of an op.
+    """
+    pycname = get_pycname(name)
+    noarg = is_noarg(overloads)
+    structseq_inits, structseq_typenames = emit_structseq_call(overloads)
+
+    method_header = ["HANDLE_TH_ERRORS"]
+    method_header += structseq_inits
+    method_header += (
+        ["const Tensor& self = THPVariable_Unpack(self_);"] if method else []
+    )
+
+    method_footer = ([] if noarg else ["Py_RETURN_NONE;"]) + ["END_HANDLE_TH_ERRORS"]
+
+    traceable = "true" if all(should_trace(o.function) for o in overloads) else "false"
+
+    grouped_overloads: Sequence[PythonSignatureGroup] = group_overloads(
+        overloads, symint=symint
+    )
+    is_singleton = len(grouped_overloads) == 1
+    signatures: list[str] = []
+    dispatch: list[str] = []
+    for overload_index, overload in enumerate(grouped_overloads):
+        signature = overload.signature.signature_str(symint=symint)
+        signatures.append(f"{cpp_string(str(signature))},")
+        dispatch_body = emit_dispatch_case(overload, structseq_typenames, symint=symint)
+        dispatch.append(
+            PY_VARIABLE_CASE.substitute(
+                overload_index=overload_index, body=dispatch_body
+            )
+            if not is_singleton
+            else dispatch_body
+        )
+
+    if noarg:
+        template = PY_VARIABLE_METHOD_NOARGS
+    elif is_singleton:
+        template = PY_VARIABLE_METHOD_VARARGS_SINGLETON
+    else:
+        template = PY_VARIABLE_METHOD_VARARGS
+
+    return template.substitute(
+        name=name,
+        pycname=pycname,
+        method_header=method_header,
+        max_args=max(o.signature.arguments_count() for o in overloads),
+        signatures=signatures,
+        traceable=traceable,
+        check_has_torch_function=gen_has_torch_function_check(
+            name=name,
+            module=module,
+            noarg=noarg,
+            method=method,
+        ),
+        dispatch=dispatch,
+        method_footer=method_footer,
+        self_="self_" if method else "nullptr",
+    )
+
+
+def gen_has_torch_function_check(
+    name: BaseOperatorName, module: str | None, *, noarg: bool, method: bool
+) -> str:
+    if noarg:
+        if method:
+            return f"""\
+if(check_has_torch_function(self_)) {{
+  return handle_torch_function(self_, "{name}");
+}}
+"""
+        else:
+            return ""
+
+    self_ = "self_" if method else "nullptr"
+    namespace = (
+        {
+            "torch": "THPVariableFunctionsModule",
+            "torch.nn": "THPNNVariableFunctionsModule",
+            "torch.fft": "THPFFTVariableFunctionsModule",
+            "torch.linalg": "THPLinalgVariableFunctionsModule",
+            "torch.nested": "THPNestedVariableFunctionsModule",
+            "torch.sparse": "THPSparseVariableFunctionsModule",
+            "torch.special": "THPSpecialVariableFunctionsModule",
+        }[module]
+        if module
+        else "THPVariableClass"
+    )
+
+    return f"""\
+if(_r.has_torch_function()) {{
+  return handle_torch_function(_r, {self_}, args, kwargs, {namespace}, "{module or "torch.Tensor"}");
+}}
+"""
+
+
+# handler for output/no-output overload pair
+PY_VARIABLE_OUT = CodeTemplate(
+    """\
+if (_r.isNone(${out_idx})) {
+  ${call_dispatch}
+} else {
+  ${call_dispatch_out}
+}
+"""
+)
+
+
+def emit_dispatch_case(
+    overload: PythonSignatureGroup,
+    structseq_typenames: dict[str, str],
+    *,
+    symint: bool = True,
+) -> str:
+    """
+    Emit dispatch code for a single parsed signature. This corresponds to either
+    a single native function, or a pair that differ only in output params. In the
+    latter case, a single python signature is used for both and dispatching
+    switches on the presence/absence of passed output args.
+    """
+    if overload.outplace is not None:
+        # dispatch output and no-output variants, branch on _r.isNone(<out_idx>)
+        return PY_VARIABLE_OUT.substitute(
+            out_idx=overload.signature.output_idx(),
+            call_dispatch=emit_single_dispatch(
+                overload.signature, overload.base, structseq_typenames, symint=symint
+            ),
+            call_dispatch_out=emit_single_dispatch(
+                overload.signature,
+                overload.outplace,
+                structseq_typenames,
+                symint=symint,
+            ),
+        )
+    else:
+        # no-output version only
+        return emit_single_dispatch(
+            overload.signature, overload.base, structseq_typenames, symint=symint
+        )
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                    Forward Declarations Codegen
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+def forward_decls(
+    name: BaseOperatorName,
+    overloads: Sequence[PythonSignatureNativeFunctionPair],
+    *,
+    method: bool,
+) -> tuple[str, ...]:
+    if method:
+        return ()
+
+    pycname = get_pycname(name)
+    if is_noarg(overloads):
+        return (
+            f"""\
+static PyObject * {pycname}(PyObject* self_, PyObject* args);
+""",
+        )
+    else:
+        return (
+            f"""\
+static PyObject * {pycname}(PyObject* self_, PyObject* args, PyObject* kwargs);
+""",
+        )
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#              Method Def (Binding Table Entry) Codegen
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+def method_def(
+    name: BaseOperatorName,
+    module: str | None,
+    overloads: Sequence[PythonSignatureNativeFunctionPair],
+    *,
+    method: bool,
+) -> str:
+    """
+    Generate method def entry.
+    """
+    pycname = get_pycname(name)
+
+    if name.dunder_method:
+        # PyMethodDef entry for binary op, throws not implemented error
+        pycname = f"TypeError_to_NotImplemented_<{pycname}>"
+
+    if is_noarg(overloads):
+        flags = "METH_NOARGS" if method else "METH_VARARGS | METH_KEYWORDS"
+    else:
+        pycname = f"castPyCFunctionWithKeywords({pycname})"
+        flags = "METH_VARARGS | METH_KEYWORDS"
+
+    if module == "torch":
+        flags += " | METH_STATIC"
+
+    return f'{{"{name}", {pycname}, {flags}, nullptr}},'
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                   Overload Sorting and Grouping
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+def group_overloads(
+    overloads: Sequence[PythonSignatureNativeFunctionPair], *, symint: bool = True
+) -> Sequence[PythonSignatureGroup]:
+    bases: dict[str, PythonSignatureNativeFunctionPair] = {}
+    outplaces: dict[str, PythonSignatureNativeFunctionPair] = {}
+
+    # first group by signature ignoring out arguments
+    for overload in overloads:
+        sig = overload.signature.signature_str(skip_outputs=True, symint=symint)
+        if overload.function.func.is_out_fn():
+            if sig in outplaces:
+                raise RuntimeError(
+                    f"Found duplicated function definition:\n- {overload.function.func}.\n"
+                    f"Existing definition:\n- {outplaces[sig].function.func}."
+                )
+            outplaces[sig] = overload
+        else:
+            if sig in bases:
+                raise RuntimeError(
+                    f"Found duplicated function definition:\n- {overload.function.func}.\n"
+                    f"Existing definition:\n- {bases[sig].function.func}."
+                )
+            bases[sig] = overload
+
+    for sig, out in outplaces.items():
+        if sig not in bases:
+            candidates: list[str] = []
+            for overload in overloads:
+                if (
+                    str(overload.function.func.name.name)
+                    == str(out.function.func.name.name)
+                    and not overload.function.func.is_out_fn()
+                    and not overload.signature.deprecated
+                ):
+                    candidates.append(
+                        overload.signature.signature_str(
+                            skip_outputs=True, symint=symint
+                        )
+                    )
+            out_sig = out.signature.signature_str(symint=symint)
+            raise RuntimeError(
+                f"While identifying overloads, we found an out schema {out_sig} without a corresponding non-out variant. "
+                f"We expected the non-out variant to have schema: \n- {sig}\nPlease check that you spelled the schema "
+                "correctly in native_functions.yaml. We discovered the following candidate(s): \n"
+                + "\n".join(f"- {candidate}" for candidate in candidates)
+            )
+
+    grouped = [
+        PythonSignatureGroup.from_pairs(
+            functional=base,
+            out=outplaces.get(sig),
+        )
+        for sig, base in bases.items()
+    ]
+    return sort_overloads(grouped, symint=symint)
+
+
+# This function declares a partial order on declarations, and sorts them according
+# to its linear extension. This is necessary, because there's some ambiguity in the
+# choice of overload, and we want a different order.
+#
+# See Note[Order of overloads matters]
+#
+# A few examples of ambiguous python signature pairs.
+#
+#   All parameters have the same type, except one taking Tensor the other taking
+#   Scalar. A numeric PyObject can be casted into Tensor, and a zero-dim Tensor
+#   object can be accepted as Scalar type parameter (see python_arg_parser.cpp).
+#   Therefore, same input arguments might be accepted by either python signature.
+#   We want to always parse the one taking Tensor first.
+#
+#     bitwise_and(Tensor input, Tensor other, *, Tensor out=None)
+#     bitwise_and(Tensor input, Scalar other, *, Tensor out=None)
+#
+#   If they have different number of parameters then they are not ambiguous - but
+#   the difference on output param can be ignored as it's optional.
+#
+#     multiply(Tensor input, Tensor other, *, Tensor out=None)
+#     multiply(Tensor input, Scalar other)
+#
+#   Both positional args and keyword-only args are considered together.
+#
+#     subtract(Tensor other, *, Scalar alpha=1)
+#     subtract(Scalar other, Scalar alpha=1)
+#
+# A few ambiguous cases which it does NOT handle yet.
+#
+#   If there is any difference in other parameters besides the Tensor/Scalar
+#   difference, then they are not considered ambiguous by this method anymore.
+#   However, the difference could be too trivial to disambiguate.
+#
+#     foo(Tensor input, Scalar other, Scalar bar)
+#     foo(Tensor input, Tensor other, double bar)
+#
+#   If they are taking different number of parameters then they are not considered
+#   ambiguous anymore, even if the difference is only on optional kwargs.
+#
+#     foo(Scalar other, Scalar alpha=1)
+#     foo(Tensor other, *, Scalar alpha=1, Scalar beta=1)
+#
+
+
+def sort_overloads(
+    grouped_overloads: Sequence[PythonSignatureGroup], *, symint: bool = True
+) -> Sequence[PythonSignatureGroup]:
+    # NB: Smaller here means lower priority
+
+    def is_arg_smaller(t1: Type, t2: Type) -> bool:
+        return (
+            str(t1) == "Scalar"
+            and str(t2) == "Tensor"
+            or str(t1) == "Scalar?"
+            and str(t2) == "Tensor?"
+            or "Dimname" in str(t1)
+            and "Dimname" not in str(t2)
+            or
+            # In the discussion https://github.com/pytorch/pytorch/issues/54555 it has been
+            # discussed why it is important to prioritize int/int? over int[]
+            str(t1) == "int[]"
+            and (str(t2) == "int" or str(t2) == "int?")
+            or
+            # TensorList currently throws an error during argument parsing, that's why it needs to be
+            # last in signature ordering. See discussion: https://github.com/pytorch/pytorch/issues/58087
+            str(t1) == "Tensor[]"
+            and str(t2).find("[]") != -1
+            or
+            # Prioritize IntArrayRef overload over SymIntArrayRef
+            str(t1) == "SymInt[]"
+            and str(t2) == "int[]"
+            or
+            # Make sure both in, SymInt are sorted consistently w.r.t. Tensor since Tensor can be implicitly
+            # converted to either int or SymInt.  Prioritize the Tensor overload since it otherwise gets shadowed.
+            (str(t1) == "SymInt" or str(t1) == "int")
+            and str(t2) == "Tensor"
+        )
+
+    def is_smaller(s1: PythonSignature, s2: PythonSignature) -> bool:
+        """Returns True if s1 < s2 in the partial order."""
+        args1, args2 = s1.arguments(skip_outputs=True), s2.arguments(skip_outputs=True)
+        if len(args1) != len(args2):
+            return False
+        # TODO: should use some canonical form instead of 'str(arg.type)' - see comments
+        # above. The old codegen used the deprecated 'dynamic_type(arg.type)', which
+        # ignores the optional annotation, i.e. 'Scalar' and 'Scalar?'.
+        equal = all(arg1.type == arg2.type for arg1, arg2 in zip(args1, args2))
+        smaller_or_equal = all(
+            str(arg1.type) == str(arg2.type) or is_arg_smaller(arg1.type, arg2.type)
+            for arg1, arg2 in zip(args1, args2)
+        )
+        return smaller_or_equal and not equal
+
+    # First sort by signature
+    grouped_overloads = sorted(
+        grouped_overloads, key=lambda x: x.signature.signature_str(symint=symint)
+    )
+
+    # Construct the relation graph
+    larger_than: dict[int, set[int]] = defaultdict(set)
+    for i1, overload1 in enumerate(grouped_overloads):
+        for i2, overload2 in enumerate(grouped_overloads):
+            if is_smaller(overload1.signature, overload2.signature):
+                larger_than[i1].add(i2)
+
+    if not larger_than:
+        return list(grouped_overloads)
+
+    # Use a topological sort to sort overloads according to the partial order.
+    N = len(grouped_overloads)
+    sorted_ids: list[int] = list(filter(lambda x: x not in larger_than, range(N)))
+
+    for idx in range(N):
+        # The size of sorted_ids will grow to N eventually.
+        i = sorted_ids[idx]
+        for j in sorted(larger_than.keys()):
+            larger = larger_than[j]
+            larger.discard(i)
+            if not larger:
+                del larger_than[j]
+                sorted_ids.append(j)
+
+    return [grouped_overloads[x] for x in sorted_ids]
+
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+#
+#                       Codegen API Integration
+#
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
+
+
+def emit_single_dispatch(
+    ps: PythonSignature,
+    f: NativeFunction,
+    structseq_typenames: dict[str, str],
+    *,
+    symint: bool = True,
+) -> str:
+    """
+    Emit dispatch code for a single native function.
+    """
+
+    @with_native_function
+    def go(f: NativeFunction) -> str:
+        # header comments
+        if isinstance(ps, PythonSignatureDeprecated):
+            schema_comment = f"// [deprecated] aten::{ps.deprecated_schema}"
+        else:
+            schema_comment = f"// aten::{f.func}"
+
+        # dispatch lambda signature
+        name = cpp.name(f.func)
+        lambda_formals = ", ".join(
+            f"{a.type_str} {a.name}" for a in dispatch_lambda_args(ps, f, symint=symint)
+        )
+        lambda_return = dispatch_lambda_return_str(f)
+
+        # dispatch lambda body
+        dispatch_callee = cpp_dispatch_target(f)
+        dispatch_args = ", ".join(cpp_dispatch_exprs(f, python_signature=ps))
+
+        # from arg parser outputs to dispatch lambda arguments
+        parser_outputs = arg_parser_output_exprs(ps, f, symint=symint)
+        lambda_arg_exprs = dispatch_lambda_exprs(ps, f, symint=symint)
+        inits = "\n".join(lambda_arg_exprs.inits)
+        lambda_args = ", ".join(lambda_arg_exprs.exprs)
+
+        # scatter fields
+        # TODO: Checking `ps.method and ('requires_grad' in parser_outputs)` is a hacky
+        #       solution for enabling the 'requires_grad' argument for tensor methods
+        #       new_full, new_empty, and new_zeros. A much better but more difficult to
+        #       implement solution involves refactoring according to Ed's description here:
+        #       https://github.com/pytorch/pytorch/issues/36455#issuecomment-614767589
+        need_set_requires_grad = ps.tensor_options_args and (
+            not has_tensor_options(f)
+            or (ps.method and ("requires_grad" in parser_outputs))
+        )
+        set_requires_grad = (
+            f".set_requires_grad({parser_outputs['requires_grad'].expr})"
+            if need_set_requires_grad
+            else ""
+        )
+
+        if lambda_return == "void":
+            # Make in-place foreach return `self` at python-binding level.
+            # ref: https://github.com/pytorch/pytorch/pull/118622#pullrequestreview-1904804954
+            self_arg = f.func.arguments.self_arg
+            return_stmt: str
+            if (
+                str(f.func.name).startswith("_foreach_")
+                and f.func.kind() == SchemaKind.inplace
+            ):
+                # note(crcrpar): `_foreach_pow.ScalarAndTensor` does NOT have its in-place
+                # variant and it unlikely to have it in the future. Thus it's safe to have the following assert.
+                assert self_arg is not None and is_tensor_list_type(
+                    self_arg.argument.type
+                )
+                return_stmt = """PyObject* self_tensorlist = _r.args[0];
+Py_INCREF(self_tensorlist);
+return self_tensorlist;
+"""
+            else:
+                return_stmt = "Py_RETURN_NONE;"
+            return f"""\
+{schema_comment}
+{inits}
+auto dispatch_{name} = []({lambda_formals}) -> {lambda_return} {{
+  pybind11::gil_scoped_release no_gil;
+  {dispatch_callee}({dispatch_args});
+}};
+dispatch_{name}({lambda_args}){set_requires_grad};
+{return_stmt}
+"""
+        else:
+            typename = structseq_typenames.get(gen_structseq_typename_key(f))
+            structseq_typeref = f"{typename}, " if typename is not None else ""
+            return f"""\
+{schema_comment}
+{inits}
+auto dispatch_{name} = []({lambda_formals}) -> {lambda_return} {{
+  pybind11::gil_scoped_release no_gil;
+  return {dispatch_callee}({dispatch_args});
+}};
+return wrap({structseq_typeref}dispatch_{name}({lambda_args}){set_requires_grad});
+"""
+
+    return go(f)
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_trace_type.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_trace_type.py
new file mode 100644
index 0000000000000000000000000000000000000000..0a4ecbd14f514851610c27a4d810b88db934d4df
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_trace_type.py
@@ -0,0 +1,540 @@
+from __future__ import annotations
+
+import itertools
+from typing import TYPE_CHECKING
+
+from torchgen.api import cpp
+from torchgen.api.types import DispatcherSignature
+from torchgen.code_template import CodeTemplate
+from torchgen.context import with_native_function
+from torchgen.model import Argument, NativeFunction, SchemaKind, TensorOptionsArguments
+from torchgen.utils import FileManager
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+
+# Note [Manual Backend kernels]
+# For these ops, we want to manually register to dispatch key Backend and
+# skip codegen-ed registration to all keys before Backend.
+# For codegen this means:
+#   - op set below must match ops with manual_kernel_registration=True in native_functions.yaml
+#     where we skip codegen backend kernels
+#   - all ops below are part of MANUAL_AUTOGRAD to skip codegen Autograd kernel registration
+#   - all ops below are part of MANUAL_TRACER to skip codegen Tracer kernel registration
+# Note: we still register to dispatch key Profiler for these ops, keeping it untouched for now.
+# You can find the manual registration in torch/csrc/autograd/VariableTypeManual.cpp
+MANUAL_BACKEND = {
+    "options",
+    "data",
+    "set_data",
+    "is_leaf",
+    "output_nr",
+    "_version",
+    "retain_grad",
+    "_backward",
+    "requires_grad_",
+}
+
+# For these ops we want to skip the codegen-ed registration to both Autograd and Tracer keys.
+# You can find the manual registration in torch/csrc/autograd/VariableTypeManual.cpp
+MANUAL_AUTOGRAD_AND_TRACER = {
+    "resize_",
+    "resize_as_",
+    "detach",
+    "detach_",
+    "copy_",
+    "_fw_primal",
+    "_make_dual",
+}
+
+# Currently MANUAL_AUTOGRAD and MANUAL_TRACER share the same set of ops:
+#   union(MANUAL_BACKEND, MANUAL_AUTOGRAD_AND_TRACER)
+# You can find the manual registration in torch/csrc/autograd/VariableTypeManual.cpp
+MANUAL_AUTOGRAD = MANUAL_TRACER = MANUAL_BACKEND | MANUAL_AUTOGRAD_AND_TRACER
+
+# These functions we don't want to record for tracing, because we always want
+# to trace their constituent parts.  This is a temporary hack in lieue
+# of proper scopes, where subsequent compilation passes can ask for the unfolding
+# on demand.  Only concrete ATen methods can be disabled this way; it will have
+# NO EFFECT otherwise.
+DONT_RECORD_TRACE = {
+    "convolution",
+    "conv1d",
+    "conv2d",
+    "conv3d",
+    "conv_transpose1d",
+    "conv_transpose2d",
+    "conv_transpose3d",
+    "lstm_cell",
+    "gru_cell",
+    "rnn_tanh_cell",
+    "rnn_relu_cell",
+    # FIXME: figure out a better way when we support sparse tensors in jit
+    "_coalesced",
+}
+
+
+def should_trace(f: NativeFunction) -> bool:
+    # Operations involving Storage or Type are not traceable at the moment
+    if any(
+        str(arg.type) in {"Storage", "Type"} for arg in f.func.schema_order_arguments()
+    ):
+        return False
+    # We can't trace functions which don't have any Tensor or TensorList returns
+    if not any(r.type.is_tensor_like() for r in f.func.returns):
+        return False
+    return f.func.name.name.base not in DONT_RECORD_TRACE
+
+
+SELECT = CodeTemplate(
+    """\
+
+if (${cond}) {
+  ${true}
+} else {
+  ${false}
+}
+"""
+)
+
+OP_NAME = CodeTemplate(
+    """\
+op_name = c10::Symbol::fromQualString("aten::${trace_name}");
+"""
+)
+
+# These functions have their names recorded under trace renamed,
+RENAME_TRACE = {
+    "zero": "zeros_like",  # replacing aten::zero_ with aten::zeros_like
+    "fill": "full_like",  # replacing aten::fill_ with aten::full_like
+}
+
+
+def format_trace_op_name(f: NativeFunction) -> str:
+    # TODO: byte-for-byte compatible with old codegen behavior - should clean up
+    if (
+        f.func.kind() in (SchemaKind.functional, SchemaKind.out)
+        or f.func.name.name.dunder_method
+    ):
+        # special case for *_out functions: the in-place and out-of-place ops
+        # are overloaded with the same name in the JIT
+        trace_name = str(f.func.name.name)
+        trace_name = RENAME_TRACE.get(trace_name, trace_name)
+        return OP_NAME.substitute(trace_name=trace_name)
+
+    # otherwise, this is an in-place op and we need to emit both in- and
+    # out-of-place versions
+    outplace_trace_name = f.func.name.name.base
+    inplace_trace_name = cpp.name(f.func)
+    outplace_trace_name = RENAME_TRACE.get(outplace_trace_name, outplace_trace_name)
+    inplace_trace_name = RENAME_TRACE.get(inplace_trace_name, inplace_trace_name)
+
+    return SELECT.substitute(
+        cond="tracer_state->force_outplace",
+        true=OP_NAME.substitute(trace_name=outplace_trace_name),
+        false=OP_NAME.substitute(trace_name=inplace_trace_name),
+    )
+
+
+ADD_TRACE_INPUT = CodeTemplate("""jit::tracer::addInputs(node, "${name}", ${input});""")
+
+
+def format_trace_inputs(f: NativeFunction) -> str:
+    def dispatch_trace_input(arg: Argument | TensorOptionsArguments) -> Sequence[str]:
+        if isinstance(arg, TensorOptionsArguments):
+            name = "options"
+            return [
+                ADD_TRACE_INPUT.substitute(
+                    name=name, input="c10::optTypeMetaToScalarType(options.dtype_opt())"
+                ),
+                ADD_TRACE_INPUT.substitute(name=name, input="options.layout()"),
+                ADD_TRACE_INPUT.substitute(name=name, input="options.device()"),
+                ADD_TRACE_INPUT.substitute(name=name, input="options.pinned_memory()"),
+            ]
+        else:
+            name = arg.name
+            if str(arg.type) == "Tensor?[]":
+                return [f'jit::tracer::addInputs(node, "{name}", {name});']
+            else:
+                return [ADD_TRACE_INPUT.substitute(name=name, input=name)]
+
+    args: list[Argument | TensorOptionsArguments] = list(
+        f.func.schema_order_arguments()
+    )
+
+    if f.func.is_out_fn():
+        # *_out functions take the result as a separate argument, but we don't want to
+        # trace that argument directly. Instead, we trace its TensorOptions.
+        # So first, we need to remove the out argument from the list of arguments to trace.
+        num_out_args = len(f.func.arguments.out)
+        args = args[:-num_out_args]
+
+    trace_inputs = itertools.chain.from_iterable(
+        dispatch_trace_input(arg) for arg in args
+    )
+
+    if f.func.is_out_fn():
+        # for *_out functions, handle the result argument differently for inplace/outplace.
+        # For inplace: just add the input to the end to confirm with the JIT schema
+        inplace = [
+            ADD_TRACE_INPUT.substitute(
+                name=f.func.arguments.out[i].name, input=f.func.arguments.out[i].name
+            )
+            # pyrefly: ignore [unbound-name]
+            for i in range(num_out_args)
+        ]
+
+        # for outplace: do nothing, except if the function is a factory.
+        # Factories are a bit special because their out-of-place overloads
+        # take an extra TensorOptions argument, which is missing in the _out function
+        has_tensor_return = any(r.type.is_tensor_like() for r in f.func.returns)
+        has_tensor_input_arg = any(
+            a.type.is_tensor_like() for a in f.func.arguments.flat_non_out
+        )
+        is_factory_method = f.category_override == "factory" or (
+            has_tensor_return and not has_tensor_input_arg
+        )
+
+        # HACK: preserve old codegen behavior - the old codegen set the `is_factory_method`
+        # flag for the whole family of ops with the same basename if any of them is a
+        # factory method. For most cases the whole family of ops are indeed all factory
+        # method - 'normal' is the only exception. So we handle it specially here to avoid
+        # cloning the old logic.
+        if f.func.name.name.base == "normal":
+            is_factory_method = True
+
+        if is_factory_method:
+            outplace = [
+                ADD_TRACE_INPUT.substitute(
+                    name="out",
+                    input="c10::optTypeMetaToScalarType(out.options().dtype_opt())",
+                ),
+                ADD_TRACE_INPUT.substitute(name="out", input="out.options().layout()"),
+                ADD_TRACE_INPUT.substitute(name="out", input="out.options().device()"),
+                ADD_TRACE_INPUT.substitute(
+                    name="out", input="out.options().pinned_memory()"
+                ),
+            ]
+        else:
+            outplace = []
+
+        trace_inputs = itertools.chain(
+            trace_inputs,
+            [
+                SELECT.substitute(
+                    cond="tracer_state->force_outplace",
+                    true="\n".join(outplace),
+                    false="\n".join(inplace),
+                )
+            ],
+        )
+
+    return "\n".join(trace_inputs)
+
+
+# `torch.jit.trace` have undocumented keyword argument `_force_outplace`,
+# which force jit to replace functions with outplace variants (for
+# example `aten::add_` becomes `aten::add`).
+#
+# This replacement implemented in-place with minimum modifications of
+# arguments stack (as it assumes that outplace call has the same arguments
+# as inplace version).
+#
+# However there are no such substitutions available for `aten::fill_`
+# and `aten::zero_` operators, as we never implemented `aten::fill`
+# and `aten::zero`. So jit tracing hack replacing `aten::zero_` with
+# `aten::zeros_like` and replacing `aten::fill_` with `aten::full_like`.
+#
+# But as they potentially can have different arguments, we also have
+# to hack into the stack and add missing ones.
+#
+# A possible alternative would be:
+#
+#  - Add `aten::fill` and `aten::zero`
+#
+#  - Or keep `aten::zeros_like` arguments aligned with `aten::zero_`
+# arguments (inside of the `native_functions.yaml`)
+RENAME_TRACE_ADD_ARGS = {
+    "fill": """\
+    jit::tracer::addInputs(node, "options", ::std::optional<ScalarType>());
+    jit::tracer::addInputs(node, "options", layout_or_default(::std::nullopt));
+    jit::tracer::addInputs(node, "options", device_or_default(::std::nullopt));
+    jit::tracer::addInputs(node, "options", pinned_memory_or_default(::std::nullopt));
+    ::std::optional<MemoryFormat> memory_format = c10::MemoryFormat::Preserve;
+    jit::tracer::addInputs(node, "memory_format", memory_format);
+""",
+    "zero": """\
+    jit::tracer::addInputs(node, "options", ::std::optional<ScalarType>());
+    jit::tracer::addInputs(node, "options", layout_or_default(::std::nullopt));
+    jit::tracer::addInputs(node, "options", device_or_default(::std::nullopt));
+    jit::tracer::addInputs(node, "options", pinned_memory_or_default(::std::nullopt));
+    ::std::optional<MemoryFormat> memory_format = c10::MemoryFormat::Preserve;
+    jit::tracer::addInputs(node, "memory_format", memory_format);
+""",
+}
+
+INPLACE_GUARD = CodeTemplate(
+    """\
+jit::tracer::ensureUniqueIfOutOfPlaced("${name}", ${mutable_input});
+"""
+)
+
+PRE_RECORD_TRACE = CodeTemplate(
+    """\
+torch::jit::Node* node = nullptr;
+std::shared_ptr<jit::tracer::TracingState> tracer_state;
+if (jit::tracer::isTracing()) {
+  tracer_state = jit::tracer::getTracingState();
+  at::Symbol op_name;
+  ${set_op_name}
+  node = tracer_state->createNode(op_name, /*num_outputs=*/0);
+  jit::tracer::recordSourceLocation(node);
+  ${add_trace_inputs}
+  tracer_state->insertNode(node);
+  ${inplace_guard}
+  jit::tracer::setTracingState(nullptr);
+}
+"""
+)
+
+
+def format_prerecord_trace(f: NativeFunction) -> str:
+    if not should_trace(f):
+        return ""
+
+    # TODO: clean up old codegen behavior
+    is_inplace = (
+        f.func.kind() in (SchemaKind.inplace, SchemaKind.out)
+        and not f.func.name.name.dunder_method
+    )
+    add_args = (
+        RENAME_TRACE_ADD_ARGS.get(f.func.name.name.base, "") if is_inplace else ""
+    )
+    additional_inputs = (
+        SELECT.substitute(
+            cond="tracer_state->force_outplace",
+            true=add_args,
+            false="",
+        )
+        if add_args
+        else ""
+    )
+
+    return PRE_RECORD_TRACE.substitute(
+        set_op_name=format_trace_op_name(f),
+        add_trace_inputs=format_trace_inputs(f) + additional_inputs,
+        inplace_guard=INPLACE_GUARD.substitute(
+            name=cpp.name(f.func),
+            mutable_input=f.func.arguments.out[0].name
+            if f.func.arguments.out
+            else "self",
+        )
+        if is_inplace
+        else "",
+    )
+
+
+POST_RECORD_TRACE = CodeTemplate(
+    """\
+if (tracer_state) {
+  jit::tracer::setTracingState(std::move(tracer_state));
+  ${add_trace_outputs}
+}
+"""
+)
+
+
+def format_postrecord_trace(f: NativeFunction) -> str:
+    if not should_trace(f):
+        return ""
+
+    # For outplacing ops, *_out overloads require special handling to move the
+    # output *argument* to a return value
+    if f.func.is_out_fn():
+        output_names_outplace = [arg.name for arg in f.func.arguments.out]
+        output_names_inplace = cpp.return_names(f)
+
+        # Code size optimization: the common case is that the return value is
+        # the same for both variants
+        if output_names_outplace == output_names_inplace:
+            outputs = [
+                f"jit::tracer::addOutput(node, {n});" for n in output_names_outplace
+            ]
+            return POST_RECORD_TRACE.substitute(add_trace_outputs=outputs)
+
+        selection = SELECT.substitute(
+            cond="force_outplace",
+            true="\n".join(
+                f"jit::tracer::addOutput(node, {n});" for n in output_names_outplace
+            ),
+            false="\n".join(
+                f"jit::tracer::addOutput(node, {n});" for n in output_names_inplace
+            ),
+        )
+        return POST_RECORD_TRACE.substitute(add_trace_outputs=selection)
+    else:
+        output_names = cpp.return_names(f)
+        outputs = [f"jit::tracer::addOutput(node, {n});" for n in output_names]
+        return POST_RECORD_TRACE.substitute(add_trace_outputs=outputs)
+
+
+def tie_return_values(f: NativeFunction) -> str:
+    if len(f.func.returns) == 1:
+        return f"auto {f.func.returns[0].name or 'result'}"
+    names = cpp.return_names(f)
+    return f"auto [{', '.join(names)}]"
+
+
+def get_return_value(f: NativeFunction) -> str:
+    names = cpp.return_names(f)
+    if len(f.func.returns) == 1:
+        return names[0]
+    if f.func.kind() == SchemaKind.out:
+        return f"std::forward_as_tuple({', '.join(names)})"
+    else:
+        moved = ", ".join(f"std::move({name})" for name in names)
+        return f"std::make_tuple({moved})"
+
+
+TRACE_DISPATCH = CodeTemplate(
+    """\
+${assign_return_values}at::_ops::${unambiguous_name}::redispatch(${unpacked_args});"""
+)
+
+
+def emit_trace_body(f: NativeFunction) -> list[str]:
+    trace_body: list[str] = []
+
+    trace_body.append(format_prerecord_trace(f))
+
+    dispatcher_sig = DispatcherSignature.from_schema(f.func)
+    dispatcher_exprs = dispatcher_sig.exprs()
+
+    # code-generated tracing kernels plumb and recompute dispatch keys directly through the kernel for performance.
+    # See Note [Plumbing Keys Through The Dispatcher] for details.
+    dispatch_key_set = "ks & c10::DispatchKeySet(c10::DispatchKeySet::FULL_AFTER, c10::DispatchKey::Tracer)"
+    redispatch_args = ", ".join([dispatch_key_set] + [a.expr for a in dispatcher_exprs])
+
+    assign_return_values = (
+        f"{tie_return_values(f)} = "
+        if f.func.kind() in [SchemaKind.functional, SchemaKind.mutable]
+        and f.func.returns
+        else ""
+    )
+
+    # Note that this calls the slow, dispatching variants of manual_cpp_binding ops.
+    # We could probably work harder to ensure that the fast variants are
+    # called instead, but the perf benefit would be minimal.
+    trace_body.append(
+        TRACE_DISPATCH.substitute(
+            assign_return_values=assign_return_values,
+            unambiguous_name=f.func.name.unambiguous_name(),
+            unpacked_args=redispatch_args,
+        )
+    )
+
+    trace_body.append(format_postrecord_trace(f))
+    if f.func.returns:
+        trace_body.append(f"return {get_return_value(f)};")
+    return trace_body
+
+
+METHOD_DEFINITION = CodeTemplate(
+    """\
+${return_type} ${type_wrapper_name}(${formals}) {
+  ${type_definition_body}
+}
+"""
+)
+
+
+def type_wrapper_name(f: NativeFunction, key: str = "Default") -> str:
+    if f.func.name.overload_name:
+        name = f"{cpp.name(f.func)}_{f.func.name.overload_name}"
+    else:
+        name = cpp.name(f.func)
+
+    # The key argument is only used in gen_variable_type where we need fns per autograd dispatch key.
+    # In gen_trace_type and gen_inplace_view_type where only one fn per native_fn must be generated,
+    # the key argument should not be passed.
+    # We do not append key if it is Default so that generated functions from
+    # before per-dispatch-key derivatives were added retain the same names.
+    if key != "Default":
+        name = name + f"_{key}"
+    return name
+
+
+@with_native_function
+def method_definition(f: NativeFunction) -> str:
+    assert cpp.name(f.func) not in MANUAL_TRACER
+
+    formals = ", ".join(
+        # code-generated tracing kernels plumb and recompute dispatch keys directly through the kernel for performance.
+        # See Note [Plumbing Keys Through The Dispatcher] for details.
+        ["c10::DispatchKeySet ks"]
+        + [
+            f"{cpp.argument_type(a, binds='__placeholder__', symint=True).cpp_type()} {a.name}"
+            for a in f.func.schema_order_arguments()
+        ]
+    )
+
+    return METHOD_DEFINITION.substitute(
+        return_type=cpp.returns_type(f.func.returns, symint=True).cpp_type(),
+        type_wrapper_name=type_wrapper_name(f),
+        formals=formals,
+        type_definition_body=emit_trace_body(f),
+    )
+
+
+WRAPPER_REGISTRATION = CodeTemplate(
+    """\
+m.impl("${name}",
+       TORCH_FN(${class_type}::${type_wrapper_name})
+);
+"""
+)
+
+
+@with_native_function
+def method_registration(f: NativeFunction) -> str:
+    assert cpp.name(f.func) not in MANUAL_TRACER
+
+    return WRAPPER_REGISTRATION.substitute(
+        name=f.func.name,
+        type_wrapper_name=type_wrapper_name(f),
+        class_type="TraceType",
+    )
+
+
+def gen_trace_type_func(fn: NativeFunction) -> dict[str, list[str]]:
+    return {
+        "ops_headers": [f"#include <ATen/ops/{fn.root_name}_ops.h>"],
+        "trace_method_definitions": [method_definition(fn)],
+        "trace_wrapper_registrations": [method_registration(fn)],
+    }
+
+
+def gen_trace_type(
+    out: str, native_functions: list[NativeFunction], template_path: str
+) -> None:
+    # NOTE: see Note [Sharded File] at the top of the VariableType.cpp
+    # template regarding sharding of the generated files.
+    fm = FileManager(install_dir=out, template_dir=template_path, dry_run=False)
+    fm.write_sharded(
+        "TraceType.cpp",
+        [fn for fn in native_functions if cpp.name(fn.func) not in MANUAL_TRACER],
+        key_fn=lambda fn: fn.root_name,
+        base_env={
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/TraceType.cpp",
+        },
+        env_callable=gen_trace_type_func,
+        num_shards=5,
+        sharded_keys={
+            "ops_headers",
+            "trace_method_definitions",
+            "trace_wrapper_registrations",
+        },
+    )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_variable_factories.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_variable_factories.py
new file mode 100644
index 0000000000000000000000000000000000000000..9916a77385d38f01e83416d4303cb17ac17de700
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_variable_factories.py
@@ -0,0 +1,116 @@
+# Generates C++ functions that wrap ATen tensor factory methods to turn them into Variables.
+#
+# This writes one file: variable_factories.h
+
+from __future__ import annotations
+
+import re
+
+import torchgen.api.python as python
+from torchgen.api import cpp
+from torchgen.api.types import CppSignatureGroup
+from torchgen.context import with_native_function
+from torchgen.gen import parse_native_yaml
+from torchgen.model import NativeFunction, TensorOptionsArguments, Variant
+from torchgen.utils import FileManager, mapMaybe
+
+
+OPTIONAL_TYPE_PATTERN = re.compile(r"std::optional<(.+)>")
+TYPE_PATTERN = re.compile(r"(?:const\s+)?([A-Z]\w+)")
+
+
+# Add 'at::' to types defined in ATen namespace, e.g. Tensor, TensorList, IntArrayRef and etc.
+# TODO: maybe update the cpp argument API to take optional namespace argument?
+def fully_qualified_type(argument_type: str) -> str:
+    def maybe_optional_type(type: str, is_opt: bool) -> str:
+        return f"std::optional<{type}>" if is_opt else type
+
+    opt_match = OPTIONAL_TYPE_PATTERN.match(argument_type)
+    is_opt = opt_match is not None
+    if opt_match:
+        argument_type = argument_type[opt_match.start(1) : opt_match.end(1)]
+    match = TYPE_PATTERN.match(argument_type)
+    if match is None:
+        return maybe_optional_type(argument_type, is_opt)
+    index = match.start(1)
+    qualified_type = f"{argument_type[:index]}at::{argument_type[index:]}"
+    return maybe_optional_type(qualified_type, is_opt)
+
+
+def gen_variable_factories(
+    out: str, native_yaml_path: str, tags_yaml_path: str, template_path: str
+) -> None:
+    native_functions = parse_native_yaml(
+        native_yaml_path, tags_yaml_path
+    ).native_functions
+    factory_functions = [fn for fn in native_functions if is_factory_function(fn)]
+    fm = FileManager(install_dir=out, template_dir=template_path, dry_run=False)
+    fm.write_with_template(
+        "variable_factories.h",
+        "variable_factories.h",
+        lambda: {
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/variable_factories.h",
+            "ops_headers": [
+                f"#include <ATen/ops/{fn.root_name}.h>" for fn in factory_functions
+            ],
+            "function_definitions": list(mapMaybe(process_function, factory_functions)),
+        },
+    )
+
+
+@with_native_function
+def is_factory_function(f: NativeFunction) -> bool:
+    if Variant.function not in f.variants:
+        return False
+
+    name = cpp.name(f.func)
+    has_tensor_options = python.has_tensor_options(f)
+    return has_tensor_options or name.endswith("_like")
+
+
+@with_native_function
+def process_function(f: NativeFunction) -> str | None:
+    name = cpp.name(f.func)
+    has_tensor_options = python.has_tensor_options(f)
+    is_factory = has_tensor_options or name.endswith("_like")
+
+    if Variant.function not in f.variants or not is_factory:
+        return None
+
+    cpp_sigs = CppSignatureGroup.from_native_function(f, method=False)
+    sigs = [cpp_sigs.signature]
+    if cpp_sigs.symint_signature is not None:
+        sigs.append(cpp_sigs.symint_signature)
+    r = ""
+    for sig in sigs:
+        formals: list[str] = []
+        exprs: list[str] = []
+        requires_grad = "false"
+        for arg in sig.arguments():
+            qualified_type = fully_qualified_type(arg.type)
+            if arg.default:
+                formals.append(f"{qualified_type} {arg.name} = {arg.default}")
+            else:
+                formals.append(f"{qualified_type} {arg.name}")
+
+            if isinstance(arg.argument, TensorOptionsArguments):
+                # note: we remove the requires_grad setting from the TensorOptions because
+                # it is ignored anyways (and we actually have an assertion that it isn't set
+                # which would fail otherwise). We handle requires_grad explicitly here
+                # instead of passing it through to the kernel.
+                exprs.append(
+                    f"at::TensorOptions({arg.name}).requires_grad(::std::nullopt)"
+                )
+                # Manually set the requires_grad bit on the result tensor.
+                requires_grad = f"{arg.name}.requires_grad()"
+            else:
+                exprs.append(arg.name)
+
+        r += f"""\
+inline at::Tensor {sig.name()}({", ".join(formals)}) {{
+  at::AutoDispatchBelowADInplaceOrView guard;
+  return autograd::make_variable(at::{sig.name()}({", ".join(exprs)}), /*requires_grad=*/{requires_grad});
+}}
+"""
+    return r
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_variable_type.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_variable_type.py
new file mode 100644
index 0000000000000000000000000000000000000000..4b6ce65bb0bffdbf5c92759ebe55f173a494828f
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_variable_type.py
@@ -0,0 +1,2203 @@
+# Generates VariableType.h/cpp
+#
+# **If any changes are being made to the VariableType codegen please also check
+# if updates are needed in torch/csrc/autograd/autograd_not_implemented_fallback.cpp
+#
+# VariableType is a subclass of at::Type that provides the binding code
+# necessary to provide a differentiable version of ATen operators. There are a
+# number of different things we could mean:
+#
+#   - Given a non-differentiable forward implementation, we might
+#     directly associate it with a backward implementation to make
+#     it differentiable.  This is the common case.
+#
+#   - Some functions don't need a backwards implementation, because
+#     backpropagation will never propagate beyond them.  There are a
+#     number of different reasons why this may be the case:
+#
+#       - The function has no differentiable inputs
+#       - The function's output is not differentiable
+#       - The function has no data dependency on its input
+#
+#   - Some function don't need a backwards implementation because they
+#     are implemented as a composition of other (differentiable) ATen
+#     functions.  These are dispatched directly to the Type superclass,
+#     which will in turn dispatch back to VariableType for its
+#     differentiable subcomponents.
+#
+
+from __future__ import annotations
+
+import re
+from typing import TYPE_CHECKING
+
+from torchgen.api import cpp
+from torchgen.api.autograd import (
+    DifferentiableInput,
+    dispatch_strategy,
+    ForwardDerivative,
+    gen_differentiable_outputs,
+    is_differentiable,
+    NativeFunctionWithDifferentiabilityInfo,
+    SavedAttribute,
+)
+from torchgen.api.types import (
+    ArrayRefCType,
+    BaseCppType,
+    BaseCType,
+    Binding,
+    intArrayRefT,
+    iTensorListRefT,
+    ListCType,
+    MutRefCType,
+    OptionalCType,
+    scalarT,
+    SpecialArgName,
+    stringT,
+    symIntArrayRefT,
+    TENSOR_LIST_LIKE_CTYPES,
+    tensorListT,
+    tensorT,
+    TupleCType,
+    VectorCType,
+)
+from torchgen.code_template import CodeTemplate
+from torchgen.context import (
+    native_function_manager,
+    with_native_function,
+    with_native_function_and,
+)
+from torchgen.model import (
+    Argument,
+    BaseType,
+    ListType,
+    NativeFunction,
+    SchemaKind,
+    SelfArgument,
+    TensorOptionsArguments,
+)
+from torchgen.utils import FileManager, mapMaybe
+
+from .context import with_native_function_with_differentiability_info_and_key
+from .gen_inplace_or_view_type import (
+    ALL_VIEW_FUNCTIONS,
+    ASSIGN_RETURN_VALUE,
+    AUTOGRAD_NOT_IMPLEMENTED_REGISTRATION,
+    gen_formals,
+    get_base_name,
+    get_view_info,
+    is_tensor_list_type,
+    is_tensor_type,
+    METHOD_DEFINITION,
+    modifies_arguments,
+    TMP_VAR,
+    unpack_args,
+    unpacked_name,
+    use_derived,
+    WRAPPER_REGISTRATION,
+)
+from .gen_trace_type import (
+    get_return_value,
+    MANUAL_AUTOGRAD_AND_TRACER,
+    MANUAL_BACKEND,
+    tie_return_values,
+    type_wrapper_name,
+)
+
+
+if TYPE_CHECKING:
+    from collections.abc import Callable, Sequence
+
+
+# We don't set or modify grad_fn on these methods. Generally, they return
+# tensors that have requires_grad=False. In-place functions listed here will
+# not examine or modify requires_grad or grad_fn.
+# NB: this does NOT include overload name
+DONT_REQUIRE_DERIVATIVE = {
+    # These only depend on the input Tensor's shape and device, not the data
+    "empty_like",
+    "ones_like",
+    "full_like",
+    "zeros_like",
+    "rand_like",
+    "randn_like",
+    "new_empty",
+    "new_empty_strided",
+    "new_full",
+    "new_zeros",
+    "new_ones",
+    # These are only implemented on integral types
+    "__and__",
+    "__iand__",
+    "__ilshift__",
+    "__ior__",
+    "__irshift__",
+    "__ixor__",
+    "__lshift__",
+    "__or__",
+    "__rshift__",
+    "__xor__",
+    # These work on integral data types, and hence don't require derivative
+    "_sobol_engine_draw",
+    "_sobol_engine_ff",
+    "_sobol_engine_scramble_",
+    "_sobol_engine_initialize_state_",
+    # This is an unsafe method that is meant to be out of reach of autograd.
+    "_coalesced_",
+    # Quantize functions should not record gradients
+    "quantize_per_tensor",
+    "quantize_per_channel",
+    # Functions that return integers should not have output that require gradients
+    "argmax",
+    "argmin",
+    "argsort",
+    "searchsorted",
+    "bucketize",
+    # Functions that return booleans are not differentiable
+    "isnan",
+    "isposinf",
+    "isneginf",
+    "isinf",
+    "signbit",
+    "isin",
+    "allclose",
+    # Functions return none are not differentiable
+    "record_stream",
+    # These functions are not differentiable
+    "logical_and",
+    "logical_xor",
+    "logical_not",
+    "logical_or",
+    # This function returns nested_tensor shape as a tensor that is non-differentiable
+    "_nested_tensor_size",
+    "_nested_tensor_strides",
+    "_nested_tensor_storage_offsets",
+}
+
+# The C -> R functions at the time of adding this are still being audited and tested
+# but will not error out.
+# C -> C, R -> C functions for which backward is correctly implemented and tested
+GRADIENT_IMPLEMENTED_FOR_COMPLEX = {
+    "fill",
+    "t",
+    "t_copy",
+    "view",
+    "reshape",
+    "reshape_as",
+    "view_as",
+    "view_copy",
+    "roll",
+    "clone",
+    "block_diag",
+    "diag_embed",
+    "repeat",
+    "expand",
+    "expand_copy",
+    "flip",
+    "fliplr",
+    "flipud",
+    "rot90",
+    "nanmean",
+    "nansum",
+    "transpose",
+    "transpose_copy",
+    "permute",
+    "permute_copy",
+    "squeeze",
+    "squeeze_copy",
+    "unsqueeze",
+    "unsqueeze_copy",
+    "resize",
+    "resize_as",
+    "tril",
+    "triu",
+    "chunk",
+    "zero_",
+    "eq_",
+    "ne_",
+    "add",
+    "__radd__",
+    "sum",
+    "_conj",
+    "sin",
+    "cos",
+    "mul",
+    "sinc",
+    "sinh",
+    "cosh",
+    "__rmul__",
+    "sgn",
+    "asin",
+    "acos",
+    "sub",
+    "div",
+    "cat",
+    "view_as_complex",
+    "index_put",
+    "neg",
+    "complex",
+    "select",
+    "where",
+    "as_strided",
+    "as_strided_copy",
+    "as_strided_scatter",
+    "slice",
+    "constant_pad_nd",
+    "unbind",
+    "unbind_copy",
+    "split",
+    "split_with_sizes",
+    "unsafe_split",
+    "split_with_sizes_backward",
+    "dot",
+    "vdot",
+    "cholesky",
+    "triangular_solve",
+    "mm",
+    "_unsafe_view",
+    "mv",
+    "outer",
+    "bmm",
+    "diagonal",
+    "alias",
+    "atan",
+    "log",
+    "log10",
+    "log1p",
+    "log2",
+    "logaddexp",
+    "logsumexp",
+    "logcumsumexp",
+    "reciprocal",
+    "tan",
+    "pow",
+    "rsqrt",
+    "tanh",
+    "tanh_backward",
+    "asinh",
+    "acosh",
+    "atanh",
+    "take",
+    "fill_",
+    "exp",
+    "exp2",
+    "expm1",
+    "nonzero",
+    "mean",
+    "std_mean",
+    "var_mean",
+    "inverse",
+    "solve",
+    "linalg_cholesky",
+    "addcmul",
+    "addcdiv",
+    "matrix_exp",
+    "linalg_matrix_exp",
+    "_linalg_eigh",
+    "cholesky_solve",
+    "linalg_qr",
+    "_linalg_svd",
+    "_fft_c2c",
+    "_fft_r2c",
+    "linalg_solve",
+    "sqrt",
+    "stack",
+    "gather",
+    "index_select",
+    "index_add_",
+    "linalg_inv",
+    "linalg_inv_ex",
+    "baddbmm",
+    "addbmm",
+    "addmm",
+    "addmv",
+    "addr",
+    "linalg_householder_product",
+    "ormqr",
+    "reflection_pad1d",
+    "reflection_pad2d",
+    "reflection_pad3d",
+    "linalg_cholesky_ex",
+    "linalg_eig",
+    "diagonal_copy",
+    "diagonal_scatter",
+    "alias_copy",
+    "select_backward",
+    "diagonal_backward",
+    "slice_backward",
+    "reflection_pad1d_backward",
+    "reflection_pad2d_backward",
+    "reflection_pad3d_backward",
+    "_sparse_sparse_matmul",
+    "replication_pad1d",
+    "replication_pad2d",
+    "replication_pad3d",
+    "put",
+    "put_",
+    "_to_copy",
+    "replication_pad1d_backward",
+    "replication_pad2d_backward",
+    "replication_pad3d_backward",
+    "diag",
+    "masked_scatter",
+    "masked_select",
+    "index_add",
+    "index_fill",
+    "trace",
+    "polar",
+    "cumsum",
+    "rsub",
+    "eig",
+    "lerp",
+    "linalg_vector_norm",
+    "cumprod",
+    "prod",
+    "index_copy",
+    "lu",
+    "unfold",
+    "unfold_backward",
+    "index",
+    "masked_fill",
+    "masked_scatter_backward",
+    "linalg_cross",
+    "lu_unpack",
+    "renorm",
+    "_conj_physical",
+    "linalg_lu_factor_ex",
+    "scatter",
+    "scatter_add",
+    "sigmoid",
+    "sigmoid_backward",
+    "sparse_mask",
+    "trapezoid",
+    "cumulative_trapezoid",
+    "conj_physical_",
+    "_neg_view",
+    "_reshape_alias",
+    "_reshape_copy",
+    "_linalg_det",
+    "lu_solve",
+    "linalg_solve_triangular",
+    "linalg_pinv",
+    "linalg_lstsq",
+    "unfold_copy",
+    "col2im",
+    "im2col",
+    "cholesky_inverse",
+    "to_sparse",
+    "sparse_sampled_addmm",
+    "linalg_lu",
+    "pixel_shuffle",
+    "pixel_unshuffle",
+    "channel_shuffle",
+    "linalg_lu_solve",
+    "_linalg_slogdet",
+    "_linalg_solve_ex",
+    "_unsafe_index",
+    "_unsafe_index_put",
+    "_unsafe_masked_index",
+    "_unsafe_masked_index_put_accumulate",
+}
+
+GRADIENT_IMPLEMENTED_FOR_SPARSE_COMPLEX = {
+    "_to_dense",
+    "_coalesce",
+    "coalesce",
+    "values",
+    "_sparse_coo_tensor_with_dims_and_tensors",
+    "_sparse_addmm",
+}
+
+GRADIENT_IMPLEMENTED_FOR_COMPLEX.update(GRADIENT_IMPLEMENTED_FOR_SPARSE_COMPLEX)
+
+# Some operators invalidate the grad_accumulator. Let's reset it.
+RESET_GRAD_ACCUMULATOR = {"set_", "resize_"}
+
+# NOTE [ TensorImpl and Storage Pointer Sanity Checks ]
+#
+# We check the following properties:
+#   1) A function should never change the input tensors' underlying c10::TensorImpl
+#      pointers or c10::Storage pointers, even if it modifies its input tensors (via
+#      inplace or out-variants)
+# If the function does not modify its arguments, we also check the following properties
+# pertaining to its output:
+#   2) Its TensorImpl has use_count of 1 (or 2 if it has a PyObject)
+#   3) If the function is a view function, it has the same StorageImpl as that of
+#      the input it is aliased with. Otherwise, its StorageImpl has use_count of 1
+#
+# The following code templates implement the checks for this invariant:
+SAVE_TENSOR_STORAGE = CodeTemplate(
+    """\
+auto ${tensor_name}_storage_saved =
+  ${tensor_name}.has_storage() ? ::std::optional<Storage>(${tensor_name}.storage()) : ::std::nullopt;
+"""
+)
+
+
+# If tensor_name == out_tensor_name, used to enforce (1), otherwise used for (2)
+ENFORCE_SAME_TENSOR_STORAGE = CodeTemplate(
+    """\
+if (${tensor_name}_storage_saved.has_value() &&
+    !at::impl::dispatch_mode_enabled() &&
+    !at::impl::tensor_has_dispatch(${tensor_name}) &&
+    !at::impl::tensor_has_dispatch(${out_tensor_name}))
+  TORCH_INTERNAL_ASSERT(${tensor_name}_storage_saved.value().is_alias_of(${out_tensor_name}.storage()));
+"""
+)
+
+SAVE_TENSORLIST_STORAGE = CodeTemplate(
+    """\
+std::vector<::std::optional<Storage>> ${tensorlist_name}_storage_saved(${tensorlist_name}.size());
+for (const Tensor& tensor : ${tensorlist_name})
+  ${tensorlist_name}_storage_saved.push_back(
+    tensor.has_storage() ? ::std::optional<Storage>(tensor.storage()) : ::std::nullopt);
+"""
+)
+
+ENFORCE_SAME_TENSORLIST_STORAGE = CodeTemplate(
+    """\
+for (size_t i=0; i<${tensorlist_name}.size() && !at::impl::dispatch_mode_enabled(); i++) {
+  if (${tensorlist_name}_storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(${tensorlist_name}))
+    TORCH_INTERNAL_ASSERT(${tensorlist_name}_storage_saved[i].value().is_alias_of(${tensorlist_name}[i].storage()));
+}
+"""
+)
+
+SAVE_OPTIONALTENSORLIST_STORAGE = CodeTemplate(
+    """\
+std::vector<::std::optional<Storage>> ${tensorlist_name}_storage_saved(${tensorlist_name}.size());
+for (const ::std::optional<Tensor>& tensor : ${tensorlist_name})
+  ${tensorlist_name}_storage_saved.push_back(
+    tensor.has_value() && tensor->has_storage() ? ::std::optional<Storage>(tensor->storage()) : ::std::nullopt);
+"""
+)
+
+ENFORCE_SAME_OPTIONALTENSORLIST_STORAGE = CodeTemplate(
+    """\
+for (size_t i=0; i<${tensorlist_name}.size() && !at::impl::dispatch_mode_enabled(); i++) {
+  if (${tensorlist_name}_storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(${tensorlist_name}))
+    TORCH_INTERNAL_ASSERT(${tensorlist_name}_storage_saved[i].value().is_alias_of(
+        static_cast<::std::optional<Tensor>>(${tensorlist_name}[i])->storage()));
+}
+"""
+)
+
+SAVE_TENSOR_IMPL = CodeTemplate(
+    """\
+c10::intrusive_ptr<TensorImpl> ${tensor_name}_impl_saved;
+if (${tensor_name}.defined()) ${tensor_name}_impl_saved = ${tensor_name}.getIntrusivePtr();
+"""
+)
+
+ENFORCE_SAME_TENSOR_IMPL = CodeTemplate(
+    """\
+if (${tensor_name}_impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(${tensor_name}))
+  TORCH_INTERNAL_ASSERT(${tensor_name}_impl_saved == ${tensor_name}.getIntrusivePtr());
+"""
+)
+
+ENFORCE_TENSOR_IMPL_USE_COUNT = CodeTemplate(
+    """\
+if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(${tensor_name}))
+  TORCH_INTERNAL_ASSERT(${tensor_name}.use_count() == expected_fresh_use_count(${tensor_name}), "function: ${fn_name}");
+"""
+)
+
+ENFORCE_TENSOR_STORAGE_USE_COUNT_EQUALS_ONE = CodeTemplate(
+    """\
+if (${tensor_name}.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(${tensor_name})) {
+  TORCH_INTERNAL_ASSERT(${tensor_name}.storage().use_count() == 1, "function: ${fn_name}");
+}
+"""
+)
+
+SAVE_TENSORLIST_IMPL = CodeTemplate(
+    """\
+std::vector<c10::intrusive_ptr<TensorImpl>> ${tensorlist_name}_impl_saved(${tensorlist_name}.size());
+for (size_t i=0; i<${tensorlist_name}.size(); i++)
+  if (${tensorlist_name}[i].defined()) ${tensorlist_name}_impl_saved[i] = ${tensorlist_name}[i].getIntrusivePtr();
+"""
+)
+
+ENFORCE_SAME_TENSORLIST_IMPL = CodeTemplate(
+    """\
+for (size_t i=0; i<${tensorlist_name}.size() && !at::impl::dispatch_mode_enabled(); i++) {
+  if (${tensorlist_name}_impl_saved[i] && !at::impl::tensorlist_has_dispatch(${tensorlist_name}))
+    TORCH_INTERNAL_ASSERT(${tensorlist_name}_impl_saved[i] == ${tensorlist_name}[i].getIntrusivePtr());
+}
+"""
+)
+
+SAVE_OPTIONALTENSORLIST_IMPL = CodeTemplate(
+    """\
+std::vector<c10::intrusive_ptr<TensorImpl>> ${tensorlist_name}_impl_saved(${tensorlist_name}.size());
+for (size_t i=0; i<${tensorlist_name}.size(); i++) {
+  ::std::optional<Tensor> t = ${tensorlist_name}[i];
+  if (t.has_value() && t->defined()) ${tensorlist_name}_impl_saved[i] = t->getIntrusivePtr();
+}
+"""
+)
+
+ENFORCE_SAME_OPTIONALTENSORLIST_IMPL = CodeTemplate(
+    """\
+for (size_t i=0; i<${tensorlist_name}.size() && !at::impl::dispatch_mode_enabled(); i++) {
+  if (${tensorlist_name}_impl_saved[i])
+    TORCH_INTERNAL_ASSERT(
+      ${tensorlist_name}_impl_saved[i] == static_cast<::std::optional<Tensor>>(${tensorlist_name}[i])->getIntrusivePtr());
+}
+"""
+)
+
+# The following list contains functions that we don't enforce the invariant on.
+DONT_ENFORCE_SAME_TENSOR_IMPL_OR_STORAGE = {
+    # These functions are expected to change impl or storage of input tensors
+    "set_",
+    "_cudnn_rnn_flatten_weight",
+    "_unsafe_masked_index",
+    "_unsafe_masked_index_put_accumulate",
+}
+DONT_ENFORCE_TENSOR_IMPL_USE_COUNT = {
+    # These non-inplace, non-out functions return tensors with use_count > 1
+    # Therefore, they MAY (but not necessarily) return one of its inputs as-is
+    # See https://github.com/pytorch/pytorch/issues/60426 for more information
+    "_embedding_bag",
+    "_embedding_bag_forward_only",
+    "q_per_channel_scales",
+    "q_per_channel_zero_points",
+    "lu_unpack",
+    "_cudnn_rnn_backward",
+    # The below failed StorageImpl use_count check but we skip tensor_impl check
+    # just in case
+    "_cudnn_rnn",
+    "dequantize_self",
+    # lift() should never actually be called with a requires_grad=True tensor,
+    "lift",
+    "lift_fresh",
+    "lift_fresh_copy",
+    # Nested Tensors related functions
+    # _nested_tensor_size() should never actually be called with requires_grad=True tensor
+    "_nested_tensor_size",
+    "_nested_tensor_strides",
+    "_nested_tensor_storage_offsets",
+}
+
+DONT_ENFORCE_STORAGE_IMPL_USE_COUNT = {
+    # These non-view functions return tensors with storage use_count != 1
+    "_slow_conv2d_forward",
+    "slow_conv3d_forward",
+    "channel_shuffle",
+    # If an input is returned as-is in output, we cannot guarantee its storage_impl
+    # use count to be 1 either.
+    *DONT_ENFORCE_TENSOR_IMPL_USE_COUNT,
+}
+# END CHECKS FOR [ TensorImpl and Storage Pointer Sanity Checks ]
+
+DECLARE_GRAD_FN = CodeTemplate(
+    """\
+std::shared_ptr<${op}> grad_fn;
+"""
+)
+
+DECLARE_VECTOR_OF_GRAD_FN = CodeTemplate(
+    """\
+std::vector<std::shared_ptr<${op}>> grad_fns;
+"""
+)
+
+SETUP_ANY_REQUIRES_GRAD = CodeTemplate(
+    """\
+[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( ${args_with_derivatives} );
+${extra_differentiability_conditions}
+"""
+)
+
+SETUP_DERIVATIVE = CodeTemplate(
+    """\
+if (_any_requires_grad) {
+  ${setup}
+}
+"""
+)
+
+SETUP_NONE_REQUIRES_GRAD = CodeTemplate(
+    """\
+if (compute_requires_grad( ${args_to_check} )) {
+  throw_error_out_requires_grad("${base_name}");
+}
+"""
+)
+
+ASSIGN_GRAD_FN = CodeTemplate(
+    """\
+grad_fn = std::shared_ptr<${op}>(new ${op}(${op_ctor}), deleteNode);
+grad_fn->set_next_edges(collect_next_edges( ${args_with_derivatives} ));
+"""
+)
+
+# note(crcrpar): `compute_requires_grad` in the template below is supplied with arguments indexed with `i`
+# while the `SETUP_ANY_REQUIRES_GRAD` above takes whole tensors and scalars.
+ASSIGN_VECTOR_OF_GRAD_FN = CodeTemplate(
+    """\
+for (const auto& i : c10::irange( ${irange} )) {
+  const auto ith_requires_grad = compute_requires_grad(${args_with_derivatives});
+  check_inplace(self[i], ith_requires_grad);
+  grad_fns.push_back([&]() -> std::shared_ptr<${op}> {
+      if (!ith_requires_grad) {
+          return nullptr;
+      } else {
+          auto grad_fn = std::shared_ptr<${op}>(new ${op}(${op_ctor}), deleteNode);
+          grad_fn->set_next_edges(collect_next_edges( ${args_with_derivatives} ));
+          return grad_fn;
+      }
+  }());
+}
+"""
+)
+
+CALL_REDISPATCH = CodeTemplate(
+    """\
+at::redispatch::${api_name}(${unpacked_args})"""
+)
+# If the non-variable operation has return values, we use the `tmp` variable to hold the
+# values temporarily and pass the values to the return variables outside of the
+# `at::AutoDispatchBelowAutograd` guard block.
+DISPATCH_TO_NON_VAR_TYPE_WITH_TMP_RETURN_VALUES_JVP_DECOMP = CodeTemplate(
+    """\
+auto ${tmp_var} = ([&]() {
+  if (${any_has_forward_grad}) {
+    static c10::OperatorName full_name("aten::${op_name}", "${op_overload}");
+    static ::std::optional<c10::OperatorHandle> opt_op = c10::Dispatcher::singleton().findSchema(full_name);
+    return impl::run_jit_decomposition_with_args_for_jvp<${return_types}>("${op_name}", *opt_op, ks, ${arg_names});
+  } else {
+    ${guard}
+    return ${base_type_call};
+  }
+})();
+"""
+)
+
+DISPATCH_TO_NON_VAR_TYPE_WITH_TMP_RETURN_VALUES = CodeTemplate(
+    """\
+auto ${tmp_var} = ([&]() {
+  ${guard}
+  return ${base_type_call};
+})();
+"""
+)
+
+DISPATCH_TO_NON_VAR_TYPE_WITHOUT_RETURN_VALUES = CodeTemplate(
+    """\
+{
+  ${guard}
+  ${base_type_call};
+}
+"""
+)
+
+SET_HISTORY = CodeTemplate(
+    """\
+if (grad_fn) {
+    ${fn}_history(${differentiable_outputs}, grad_fn);
+}
+"""
+)
+
+LOOP_OVER_VECTOR_OF_GRAD_FNS = CodeTemplate(
+    """\
+if (!grad_fns.empty()) {
+    ${preamble}
+    for (const auto& i : c10::irange(grad_fns.size())) {
+        auto grad_fn = grad_fns[i];
+        if (grad_fn != nullptr) {
+            ${statements}
+        }
+    }
+}
+"""
+)
+
+CONDITIONAL = CodeTemplate(
+    """\
+if (${cond}) {
+  ${statements}
+}
+"""
+)
+
+RUN_ONLY_IN_DEBUG_MODE = CodeTemplate(
+    """\
+#ifndef NDEBUG
+${statements}
+#endif
+"""
+)
+
+FW_DERIVATIVE_CHECK_TEMPLATE = CodeTemplate(
+    """\
+isFwGradDefined(${req_inp})\
+"""
+)
+FW_DERIVATIVE_SIZE_CHECK_TEMPLATE = CodeTemplate(
+    """\
+TORCH_CHECK(
+    self.size() == ${inp_name}.size(),
+      "Tensor lists must have the same number of tensors, got ",
+    self.size(),
+      " and ",
+    ${inp_name}.size());
+"""
+)
+
+FW_DERIVATIVE_TENSORLIST_CHECK_TEMPLATE = CodeTemplate(
+    """\
+isFwGradDefinedTensorList(${req_inp})\
+"""
+)
+
+FW_DERIVATIVE_DEFINED_GRAD_TEMPLATE = CodeTemplate(
+    """\
+auto ${inp_name}_t_raw = toNonOptFwGrad(${inp});
+auto ${inp_name}_tensor = toNonOptTensor(${inp});
+auto ${inp_name}_t = (${inp_name}_t_raw.defined() || !${inp_name}_tensor.defined())
+  ? ${inp_name}_t_raw : at::${zeros_fn}(${inp_name}_tensor.sym_sizes(), ${inp_name}_tensor.options());
+"""
+)
+
+FW_DERIVATIVE_UPDATE_WRAPPED_NUM_TEMPLATE = CodeTemplate(
+    """\
+update_wrapped_number(${inp_name}_tensor, ${inp_name}_t);
+"""
+)
+
+FW_DERIVATIVE_DEFINED_PRIMAL_TEMPLATE = CodeTemplate(
+    """\
+auto ${inp_name}_p = toNonOptPrimal(${inp});
+"""
+)
+
+FW_DERIVATIVE_SETTER_TENSOR = CodeTemplate(
+    """\
+if (${out_arg}_new_fw_grad_opt.has_value() && ${out_arg}_new_fw_grad_opt.value().defined() && ${out_arg}.defined()) {
+  // The hardcoded 0 here will need to be updated once we support multiple levels.
+  ${out_arg}._set_fw_grad(${out_arg}_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ ${is_inplace});
+}
+"""
+)
+
+FW_DERIVATIVE_SETTER_TENSOR_FOREACH = CodeTemplate(
+    """\
+for (const auto& i : c10::irange(${out_arg}_new_fw_grad_opts.size())) {
+  auto& ${out_arg}_new_fw_grad_opt = ${out_arg}_new_fw_grad_opts[i];
+  if (${out_arg}_new_fw_grad_opt.has_value() && ${out_arg}_new_fw_grad_opt.value().defined() && ${out_arg}[i].defined()) {
+    // The hardcoded 0 here will need to be updated once we support multiple levels.
+    ${out_arg}[i]._set_fw_grad(${out_arg}_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ ${is_inplace});
+  }
+}
+"""
+)
+
+FW_DERIVATIVE_SETTER_MULTI_OUTPUT = CodeTemplate(
+    """\
+if (${all_res}_new_fw_grad_opt.has_value() && std::get<${idx}>(${all_res}_new_fw_grad_opt.value()).defined()
+    && ${out_arg}.defined()) {
+  ${out_arg}._set_fw_grad(std::get<${idx}>(${all_res}_new_fw_grad_opt.value()), /* level */ 0, /* is_inplace_op */ false);
+}
+"""
+)
+
+FW_DERIVATIVE_SETTER_TENSOR_LIST = CodeTemplate(
+    """\
+if (${out_arg}_new_fw_grad_opt.has_value()) {
+  auto ${out_arg}_new_fw_grad = ${out_arg}_new_fw_grad_opt.value();
+  TORCH_INTERNAL_ASSERT(${out_arg}.size() == ${out_arg}_new_fw_grad.size());
+  for (const auto i : c10::irange(${out_arg}.size())) {
+    if (${out_arg}_new_fw_grad[i].defined() && ${out_arg}[i].defined()) {
+      // The hardcoded 0 here will need to be updated once we support multiple levels.
+      ${out_arg}[i]._set_fw_grad(${out_arg}_new_fw_grad[i], /* level */ 0, /* is_inplace_op */ ${is_inplace});
+    }
+  }
+}
+"""
+)
+
+FW_DERIVATIVE_TEMPLATE = CodeTemplate(
+    """\
+${fw_grad_opt_definition}
+if (${requires_fw_grad}) {
+    ${unpacked_arguments}
+    ${out_arg}_new_fw_grad_opt = ${formula};
+}
+"""
+)
+
+FW_DERIVATIVE_FOREACH_TEMPLATE = CodeTemplate(
+    """\
+${fw_grad_opt_definition}
+for (const auto& i : c10::irange(${vector_of_optional_tensor}.size())) {
+  if (${any_has_forward_grad_for_current_index}) {
+      ${unpacked_arguments}
+      ${vector_of_optional_tensor}[i] = ${formula};
+  }
+}
+"""
+)
+
+FW_DERIVATIVE_FORBID_TEMPLATE = CodeTemplate(
+    """\
+TORCH_CHECK_NOT_IMPLEMENTED(!(${cond}), "Trying to use forward AD with ${name} that does not support it ${msg}");
+"""
+)
+
+FW_DERIVATIVE_FORBID_LIST_TEMPLATE = CodeTemplate(
+    """\
+for (const auto& _t: ${arg}) {
+    TORCH_CHECK_NOT_IMPLEMENTED(!(${cond}), "Trying to use forward AD with ${name} that does not support it ${msg}");
+}
+"""
+)
+
+
+def gen_variable_type(
+    out: str,
+    native_yaml_path: str,
+    tags_yaml_path: str,
+    fns_with_diff_infos: list[NativeFunctionWithDifferentiabilityInfo],
+    template_path: str,
+    used_keys: set[str],
+) -> None:
+    """VariableType.h and VariableType.cpp body
+
+    This is the at::Type subclass for differentiable tensors. The
+    implementation of each function dispatches to the base tensor type to
+    compute the output. The grad_fn is attached to differentiable functions.
+    """
+    fm = FileManager(install_dir=out, template_dir=template_path, dry_run=False)
+    fm.write(
+        "VariableType.h",
+        lambda: {
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/VariableType.h"
+        },
+    )
+
+    # helper that generates a TORCH_LIBRARY_IMPL macro for each
+    # dispatch key that appears in derivatives.yaml
+    def wrapper_registrations(used_keys: set[str]) -> str:
+        library_impl_macro_list: list[str] = []
+        for key in sorted(used_keys):
+            dispatch_key = key
+            if key == "Default":
+                dispatch_key = "Autograd"
+            library_impl_macro = (
+                f"TORCH_LIBRARY_IMPL(aten, {dispatch_key}, m) "
+                + "{\n"
+                + "${"
+                + f"wrapper_registrations_{key}"
+                + "}\n}"
+            )
+            library_impl_macro_list += [library_impl_macro]
+        return "\n\n".join(library_impl_macro_list)
+
+    # Generate a new template from VariableType.cpp which replaces ${wrapper_registrations}
+    # with per key TORCH_LIBRARY_IMPL macros for each key that appears in derivatives.yaml
+    fm1 = FileManager(
+        install_dir=out + "/templates", template_dir=template_path, dry_run=False
+    )
+    fm1.write(
+        "VariableType.cpp",
+        lambda: {
+            "type_derived_method_definitions": "\n\n".join(
+                [
+                    "${" + f"type_derived_method_definitions_{key}" + "}"
+                    for key in sorted(used_keys)
+                ]
+            ),
+            "wrapper_registrations": wrapper_registrations(used_keys),
+        },
+    )
+
+    # Generate final VariableType_*.cpp files from the generated template
+    fm2 = FileManager(install_dir=out, template_dir=out + "/templates", dry_run=False)
+
+    sharded_keys = set(
+        [f"type_derived_method_definitions_{key}" for key in sorted(used_keys)]
+        + [f"wrapper_registrations_{key}" for key in sorted(used_keys)]
+    )
+    # NOTE: see Note [Sharded File] at the top of the VariableType.cpp
+    # template regarding sharding of the generated files.
+    fm2.write_sharded(
+        "VariableType.cpp",
+        [fn for fn in fns_with_diff_infos if use_derived(fn)],
+        key_fn=lambda fn: cpp.name(fn.func.func),
+        base_env={
+            "generated_comment": "@"
+            + f"generated from {fm.template_dir_for_comments()}/VariableType.cpp",
+        },
+        env_callable=gen_variable_type_func,
+        num_shards=5,
+        sharded_keys=sharded_keys,
+    )
+
+
+@with_native_function_and
+def gen_wrapper_registration(f: NativeFunction, key: str = "Default") -> str:
+    return WRAPPER_REGISTRATION.substitute(
+        unqual_operator_name_with_overload=f.func.name,
+        type_wrapper_name=type_wrapper_name(f, key),
+        class_type="VariableType",
+    )
+
+
+def gen_variable_type_func(
+    fn: NativeFunctionWithDifferentiabilityInfo,
+) -> dict[str, list[str]]:
+    f = fn.func
+    result = {}
+    with native_function_manager(f):
+        name = cpp.name(f.func)
+        formals = gen_formals(f)
+
+        if (
+            fn.info is None
+            and str(f.func.name.name) not in RESET_GRAD_ACCUMULATOR
+            and get_base_name(f) not in DONT_REQUIRE_DERIVATIVE
+            and len(gen_differentiable_outputs(fn)) > 0
+            and cpp.name(f.func) not in DONT_ENFORCE_SAME_TENSOR_IMPL_OR_STORAGE
+            and type_wrapper_name(f) not in DONT_ENFORCE_STORAGE_IMPL_USE_COUNT
+            and type_wrapper_name(f) not in DONT_ENFORCE_TENSOR_IMPL_USE_COUNT
+        ):
+            # NOTE: [ Registering AutogradNotImplemented boxed kernel ]
+            #
+            # When there is no derivatives.yaml entry, we register a generic boxed
+            # NotImplemented kernel to set grad_fn to be NotImplemented, so that forward
+            # proceeds as usual but an error is properly produced on backward.
+            # TODO: it would be nice to not have these special cases
+            #
+            # There are several cases where still let codegen handle it:
+            # 1) ops that need to reset grad accumulator (we let codegen handle this case
+            #     because) the list is (currently) only accessible in Python.
+            # 2) User explicitly specifies DONT_REQUIRE_DERIVATIVE. This basically makes
+            #    autograd a fallthrough with NDEBUG checks. This can be useful for when all
+            #    outputs are integral.
+            # 3) When there are no differentiable outputs. This is similar to (2).
+            # 4) There are certain ops where we skip certain NDEBUG checks. this is similar
+            #    to (1).
+            type_definition = ""
+            wrapper_registration = AUTOGRAD_NOT_IMPLEMENTED_REGISTRATION.substitute(
+                unqual_operator_name_with_overload=f.func.name
+            )
+            result["type_derived_method_definitions_Default"] = [type_definition]
+            result["wrapper_registrations_Default"] = [wrapper_registration]
+        else:
+            if not fn.info:
+                key = "Default"
+                type_definition = METHOD_DEFINITION.substitute(
+                    return_type=cpp.returns_type(
+                        f.func.returns, symint=True
+                    ).cpp_type(),
+                    type_wrapper_name=type_wrapper_name(f, key),
+                    type_definition_body=emit_body(fn, key),
+                    formals=formals,
+                )
+                wrapper_registration = gen_wrapper_registration(f, key)
+                result[f"type_derived_method_definitions_{key}"] = [type_definition]
+                result[f"wrapper_registrations_{key}"] = [wrapper_registration]
+            else:
+                for key in fn.info:
+                    type_definition = METHOD_DEFINITION.substitute(
+                        return_type=cpp.returns_type(
+                            f.func.returns, symint=True
+                        ).cpp_type(),
+                        type_wrapper_name=type_wrapper_name(f, key),
+                        type_definition_body=emit_body(fn, key),
+                        formals=formals,
+                    )
+                    wrapper_registration = gen_wrapper_registration(f, key)
+                    result[f"type_derived_method_definitions_{key}"] = [type_definition]
+                    result[f"wrapper_registrations_{key}"] = [wrapper_registration]
+    # See Note [Manual Backend kernels]
+    assert (name in MANUAL_BACKEND) == f.manual_kernel_registration
+    # If you want to register a kernel to Autograd, you must make the op abstract.
+    # In other words, this op must have dispatch section in native_functions.yaml.
+    if name in MANUAL_AUTOGRAD_AND_TRACER or (
+        fn.info and any(info.has_derivatives for info in fn.info.values())
+    ):
+        msg = (
+            f"There's a formula for {name}(or its functional variant) in derivatives.yaml. "
+            f"It's required to add a dispatch section for it with explicit supported backends e.g CPU/CUDA "
+            f"or CompositeExplicitAutograd in native_functions.yaml. Please see "
+            f"https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/native#choosing-the-right-dispatch-keyword "
+            f"for instructions to choose the right dispatch keyword."
+        )
+        assert f.is_abstract, msg
+
+    return result
+
+
+_foreach_ops_without_differentiability_info = {
+    # No reference backward available due to the lack of `{maximum, minimum}(tensor, scalar)`.
+    ("_foreach_maximum", "Scalar"),
+    ("_foreach_maximum", "ScalarList"),
+    ("_foreach_minimum", "Scalar"),
+    ("_foreach_minimum", "ScalarList"),
+    # No reference backward available as addcdiv/addcmul don't support Tensor as scaling factor.
+    ("_foreach_addcdiv", "Tensor"),
+    ("_foreach_addcmul", "Tensor"),
+    ("_foreach_copy", ""),
+}
+
+_foreach_ops_with_different_arity = {
+    # These ops lack `alpha` of scaling factor to applied to the right hand side argument.
+    ("_foreach_add", "Scalar"),
+    ("_foreach_add", "ScalarList"),
+    ("_foreach_sub", "Scalar"),
+    ("_foreach_sub", "ScalarList"),
+}
+
+
+@with_native_function_with_differentiability_info_and_key
+def emit_body(
+    fn: NativeFunctionWithDifferentiabilityInfo, key: str = "Default"
+) -> list[str]:
+    assert dispatch_strategy(fn) == "use_derived"
+    f = fn.func
+    info = fn.info[key] if fn.info else None
+    fw_derivatives = fn.fw_derivatives.get(key, []) if fn.fw_derivatives else []
+
+    name = cpp.name(f.func)
+    inplace = f.func.kind() == SchemaKind.inplace
+    is_out_fn = f.func.kind() == SchemaKind.out
+    returns_void = len(f.func.returns) == 0
+    base_name = get_base_name(f)
+    view_info = get_view_info(f)
+
+    is_foreach = name.startswith("_foreach")
+    is_inplace_foreach = is_foreach and inplace
+    if is_inplace_foreach:
+        inplace_foreacharg2refarg: dict[Argument, Argument] = {}
+        refargname2inplace_foreacharg: dict[str, Argument] = {}
+        base_name_and_overload_name = (f.func.name.name.base, f.func.name.overload_name)
+        if info is None:
+            assert (
+                base_name_and_overload_name
+                in _foreach_ops_without_differentiability_info
+            ), (
+                f"{'.'.join(base_name_and_overload_name)} should have a differentiability info"
+            )
+        else:
+            assert (
+                len(f.func.arguments.flat_non_out)
+                == len(info.func.func.arguments.flat_non_out)
+            ) or (base_name_and_overload_name in _foreach_ops_with_different_arity), (
+                f"{'.'.join(base_name_and_overload_name)} has {len(f.func.arguments.flat_non_out)} args "
+                f"but the reference has {len(info.func.func.arguments.flat_non_out)}"
+            )
+            for foreach_arg, ref_arg in zip(
+                f.func.arguments.flat_non_out, info.func.func.arguments.flat_non_out
+            ):
+                foreach_arg_type = foreach_arg.type
+                if isinstance(foreach_arg_type, ListType):
+                    foreach_arg_type = foreach_arg_type.elem
+                assert foreach_arg_type == ref_arg.type
+                inplace_foreacharg2refarg[foreach_arg] = ref_arg
+                refargname2inplace_foreacharg[ref_arg.name] = foreach_arg
+
+    def gen_differentiable_input(
+        arg: Argument | SelfArgument | TensorOptionsArguments,
+    ) -> DifferentiableInput | None:
+        if isinstance(arg, TensorOptionsArguments):
+            return None
+        a: Argument = arg.argument if isinstance(arg, SelfArgument) else arg
+
+        # TODO: `cpp_type` is only to keep it byte-for-byte compatible with the old codegen, should remove.
+        # NB: This is not a clone of cpp.argument() - TensorOptionsArguments / faithful / binds are
+        # not handled properly as they are irrelevant for this codegen.
+        cpp_type = cpp.argument_type(a, binds=a.name, symint=True).cpp_type()
+
+        if not is_differentiable(a.name, a.type, info):
+            return None
+        return DifferentiableInput(
+            name=a.name,
+            type=a.type,
+            cpp_type=cpp_type,
+        )
+
+    @with_native_function
+    def gen_differentiable_inputs(f: NativeFunction) -> list[DifferentiableInput]:
+        arguments = list(f.func.arguments.non_out)
+        if is_inplace_foreach and info is not None:
+            for i, arg in enumerate(f.func.arguments.flat_non_out):
+                if arg in inplace_foreacharg2refarg:
+                    # note(crcrpar): From what I understand, what matters is only the name.
+                    # Thus originally I only replace argument only when the names are different.
+                    # TODO(crcrpar): Make it simpler.
+                    mapped_arg = inplace_foreacharg2refarg[arg]
+                    arguments[i] = Argument(
+                        mapped_arg.name,
+                        mapped_arg.type,
+                        mapped_arg.default,
+                        mapped_arg.annotation,
+                    )
+        return list(mapMaybe(gen_differentiable_input, arguments))
+
+    def find_args_with_derivatives(
+        differentiable_inputs: list[DifferentiableInput],
+    ) -> list[DifferentiableInput]:
+        """Find arguments that have derivative definitions"""
+        if info is None or not info.has_derivatives:
+            return differentiable_inputs
+        names = {name for d in info.derivatives for name in d.var_names}
+        differentiable = [arg for arg in differentiable_inputs if arg.name in names]
+        if len(differentiable) != len(names):
+            missing = names - {arg.name for arg in differentiable}
+            raise RuntimeError(
+                f"Missing arguments for derivatives: {missing} in {info.name}"
+            )
+        return differentiable
+
+    differentiable_inputs = gen_differentiable_inputs(f)
+    args_with_derivatives = find_args_with_derivatives(differentiable_inputs)
+    differentiable_outputs = gen_differentiable_outputs(fn, key)
+
+    undifferentiable = (base_name in DONT_REQUIRE_DERIVATIVE) or (
+        name in DONT_REQUIRE_DERIVATIVE
+    )
+
+    requires_derivative = (
+        (not undifferentiable)
+        and (len(differentiable_inputs) > 0)
+        and (
+            (len(differentiable_outputs) > 0)
+            # note(crcrpar): In-place foreach functions are a void function.
+            or is_inplace_foreach
+        )
+    )
+
+    if (
+        info is not None
+        and info.has_derivatives
+        and not requires_derivative
+        # out= ops are allowed to have zero returns which cause requires_derivative to be False
+        # we shouldn't error out though (out= ops for autograd just redispatch)
+        and len(f.func.returns) > 0
+    ):
+        raise RuntimeError(
+            f"ERROR: derivative ignored for {name} -- specified an autograd function without derivative"
+        )
+
+    # note(crcrpar): In-place foreach functions do not support forward AD
+    if requires_derivative and len(fw_derivatives) > 0 and not is_inplace_foreach:
+        assert sum(len(derivative.var_names) for derivative in fw_derivatives) == len(
+            differentiable_outputs
+        ), (
+            "Expected the number of forward derivatives implemented to match the "
+            "number of differentiable outputs. NB: This only applies when at least "
+            "one forward derivative is implemented. Not implementing any forward "
+            "derivatives is also okay, and we would require inputs to the op to "
+            "not have associated tangents in that case."
+        )
+
+    try_jit_decomposition = (
+        requires_derivative
+        and len(fw_derivatives) == 0
+        and (not modifies_arguments(f))
+        and (not returns_void)
+    )
+
+    def emit_save_inputs() -> list[str]:
+        setup: list[str] = []
+        if info is None or not info.has_derivatives:
+            return setup
+
+        has_tensorlist_arg = any(
+            is_tensor_list_type(arg.type) for arg in args_with_derivatives
+        )
+
+        # We don't want to save tensors if we know that they will never be used
+        # when computing the derivative, so we add guards to those statements
+        def guard_for(arg: SavedAttribute) -> str | None:
+            assert info is not None
+
+            # It's hard to determine the edge offset if we have TensorLists
+            # NOTE(crcrpar): in-place foreach functions' arguments include tensorlist
+            # but their derivatives don't use it, so let them bypass this check.
+            if has_tensorlist_arg and (not is_inplace_foreach):
+                return None
+
+            # Empirical evaluation of the cases where we insert those guards in
+            # backward show that they are somewhat useless. E.g. there's no need
+            # to guard on some values captured from forward, because they had to
+            # require_grad if the backward function even gets executed. I don't
+            # have any good ideas for detecting those cases, so I simply disabled the
+            # checks.
+            if "backward" in info.name:
+                return None
+
+            # If there's a single derivative we could compute, we already have
+            # a requires_grad check that is sufficient
+            if len(args_with_derivatives) <= 1:
+                return None
+
+            # We really only care about trimming down the amount of tensors we save
+            if arg.nctype.type != BaseCType(tensorT):
+                return None
+
+            # We want to emit simple guards, so we only allow that if checking one
+            # input is enough to determine whether we need that value
+            used_in = [d for d in info.derivatives if arg in d.saved_inputs]
+            assert len(used_in) > 0
+            if len(used_in) != 1:
+                return None
+            derivative = used_in[0]
+
+            # Case with multioutput formulas
+            # TODO: process all derivative formulas!!!
+            if len(derivative.var_names) != 1:
+                wrap_opt_if_start = derivative.formula.find(
+                    f"wrap_opt_if({arg.nctype.name}"
+                )
+                if wrap_opt_if_start == -1:
+                    return None
+
+                wrap_opt_if_match = re.match(
+                    rf"wrap_opt_if\({arg.nctype.name},(.*?)\)",
+                    derivative.formula[wrap_opt_if_start:],
+                )
+                assert wrap_opt_if_match is not None
+
+                # Condition is between 'wrap_opt_if(var_name,' and ')'.
+                condition_slice = slice(len(rf"wrap_opt_if\({arg.nctype.name},"), -1)
+                wrap_opt_if_condition = wrap_opt_if_match.group(0)[
+                    condition_slice
+                ].strip()
+                # replace 'grad_input_mask[num]' with 'grad_fn->should_compute_output(num)'
+                wrap_opt_if_condition = re.sub(
+                    r"grad_input_mask\[(\d+)\]",
+                    r"grad_fn->should_compute_output(\1)",
+                    wrap_opt_if_condition,
+                )
+                return f"{wrap_opt_if_condition}"
+
+            # Figure out the offset of the edge that uses this variable
+            derivative_var_name = derivative.var_names[0]
+            for edge_off, a in enumerate(args_with_derivatives):
+                if a.name == derivative_var_name:
+                    break
+            else:
+                raise AssertionError
+            return f"grad_fn->should_compute_output({edge_off})"
+
+        if is_inplace_foreach:
+            save_input_stmts = save_variables(info.all_saved_inputs, False, guard_for)
+            if save_input_stmts:
+                setup.append(
+                    LOOP_OVER_VECTOR_OF_GRAD_FNS.substitute(
+                        preamble="", statements=save_input_stmts
+                    )
+                )
+        else:
+            setup.extend(save_variables(info.all_saved_inputs, False, guard_for))
+            for arg in args_with_derivatives:
+                if is_tensor_list_type(arg.type):
+                    setup.append(f"grad_fn->{arg.name}_size_ = {arg.name}.size();")
+        return setup
+
+    def setup_derivative(differentiable_inputs: list[DifferentiableInput]) -> list[str]:
+        body: list[str] = []
+        if is_out_fn:
+            # For out functions, ensure that no input or output requires grad
+            body.append(DECLARE_GRAD_FN.substitute(op="Node"))
+            body.append(
+                SETUP_NONE_REQUIRES_GRAD.substitute(
+                    base_name=base_name,
+                    args_to_check=[arg.name for arg in differentiable_inputs],
+                )
+            )
+            body.append(
+                SETUP_NONE_REQUIRES_GRAD.substitute(
+                    base_name=base_name,
+                    args_to_check=[arg.name for arg in differentiable_outputs],
+                )
+            )
+            return body
+
+        op = info.op if info is not None and info.has_derivatives else "NotImplemented"
+        setup = []
+        if not is_inplace_foreach:
+            setup.extend(
+                ASSIGN_GRAD_FN.substitute(
+                    op=op,
+                    op_ctor=""
+                    if info is not None and info.has_derivatives
+                    else f'"{cpp.name(f.func)}"',
+                    args_with_derivatives=[arg.name for arg in args_with_derivatives],
+                ).split("\n")
+            )
+        else:
+            # note(crcrpar): Assuming in-place foreach function's self_arg is always TensorList.
+            list_like_arg = "self"
+            args = [arg.name for arg in args_with_derivatives]
+            for i, arg in enumerate(args):
+                if is_inplace_foreach and info is not None:
+                    if arg in refargname2inplace_foreacharg:
+                        foreach_arg = refargname2inplace_foreacharg[arg]
+                        args[i] = foreach_arg.name + (
+                            "[i]" if isinstance(foreach_arg.type, ListType) else ""
+                        )
+                else:
+                    if arg == list_like_arg:
+                        args[i] = arg + "[i]"
+            setup.extend(
+                ASSIGN_VECTOR_OF_GRAD_FN.substitute(
+                    op=op,
+                    op_ctor=""
+                    if info is not None and info.has_derivatives
+                    else f'"{cpp.name(f.func)}"',
+                    args_with_derivatives=args,
+                    irange=f"{list_like_arg}.size()",
+                ).split("\n")
+            )
+        setup.extend(emit_save_inputs())
+
+        body.extend(
+            emit_check_no_requires_grad(differentiable_inputs, args_with_derivatives)
+        )
+        declare_grad_fn_template = (
+            DECLARE_GRAD_FN if not is_inplace_foreach else DECLARE_VECTOR_OF_GRAD_FN
+        )
+        body.append(declare_grad_fn_template.substitute(op=op))
+        body.append(SETUP_DERIVATIVE.substitute(setup=setup))
+        return body
+
+    def emit_check_if_in_complex_autograd_allowlist() -> list[str]:
+        body: list[str] = []
+        if base_name in GRADIENT_IMPLEMENTED_FOR_COMPLEX:
+            return body
+        for arg in differentiable_outputs:
+            name = arg.name
+            # TODO: should be `arg.type.is_tensor_like()`?
+            if arg.cpp_type == "at::Tensor" or arg.cpp_type in TENSOR_LIST_LIKE_CTYPES:
+                body.append(f'throw_error_for_complex_autograd({name}, "{base_name}");')
+        return body
+
+    def emit_check_no_requires_grad(
+        tensor_args: list[DifferentiableInput],
+        args_with_derivatives: list[DifferentiableInput],
+    ) -> list[str]:
+        """Checks that arguments without derivatives don't require grad"""
+        body: list[str] = []
+        for arg in tensor_args:
+            if arg in args_with_derivatives:
+                continue
+            arg_name = arg.name
+            if info and arg_name in info.non_differentiable_arg_names:
+                continue
+            if arg_name == "output":
+                # Double-backwards definitions sometimes take in 'input' and
+                # 'output', but only define the derivative for input.
+                continue
+            body.append(f'check_no_requires_grad({arg_name}, "{arg_name}", "{name}");')
+        return body
+
+    def emit_original_self_definition() -> list[str]:
+        body: list[str] = []
+        if inplace:
+            if is_inplace_foreach:
+                body.append(
+                    "std::vector<::std::optional<at::Tensor>> original_selfs(self.size());"
+                )
+            else:
+                body.append("::std::optional<at::Tensor> original_self;")
+
+            all_forward_grad_cond = []
+            for derivative in fw_derivatives:
+                if derivative.required_original_self_value:
+                    all_forward_grad_cond.append(
+                        get_any_has_forward_grad_name(derivative.var_names)
+                    )
+
+            if all_forward_grad_cond:
+                if not is_inplace_foreach:
+                    body.append(f"if ({' || '.join(all_forward_grad_cond)}) {{")
+                    body.append("  original_self = self.clone();")
+                    body.append("}")
+                else:
+                    current_all_forward_grad_cond = [
+                        f"{cond}[i]" for cond in all_forward_grad_cond
+                    ]
+                    body.append("for (const auto& i : c10::irange(self.size())) {")
+                    body.append(
+                        f"  if ({' || '.join(current_all_forward_grad_cond)}) {{"
+                    )
+                    body.append("    original_selfs[i] = self[i].clone();")
+                    body.append("  }")
+                    body.append("}")
+
+        return body
+
+    def save_variables(
+        saved_variables: Sequence[SavedAttribute],
+        is_output: bool,
+        guard_for: Callable[[SavedAttribute], str | None] = lambda name: None,
+    ) -> Sequence[str]:
+        # assign the saved variables to the generated grad_fn
+        stmts: list[str] = []
+        for arg in sorted(saved_variables, key=lambda sa: str(sa.nctype.name)):
+            name = (
+                arg.nctype.name.name
+                if isinstance(arg.nctype.name, SpecialArgName)
+                else arg.nctype.name
+            )
+            foreacharg: Argument | None = None
+            is_foreacharg_list_type: bool = False
+            type = arg.nctype.type
+            expr = arg.expr
+            stmts_prepend = None
+            if is_inplace_foreach and info is not None:
+                # todo(crcrpar): See if we can add some check e.g. `assert foreacharg is not None`.
+                # for now the example assert would fail.
+                name_to_query = name.split("_scalar_type")[0]
+                if name_to_query in refargname2inplace_foreacharg:
+                    foreacharg = refargname2inplace_foreacharg[name_to_query]
+                    is_foreacharg_list_type = isinstance(foreacharg.type, ListType)
+                if foreacharg is not None:
+                    name_in_expr = (
+                        f"{foreacharg.name}{'[i]' if is_foreacharg_list_type else ''}"
+                    )
+                    src_name = name
+                    if "_scalar_type" in src_name:
+                        split_src_name = src_name.split("_scalar_type")
+                        assert len(split_src_name) == 2
+                        src_name = split_src_name[0]
+                    expr = expr.replace(src_name, name_in_expr)
+            if (
+                type == BaseCType(tensorT)
+                or type == OptionalCType(BaseCType(tensorT))
+                or type == MutRefCType(OptionalCType(BaseCType(tensorT)))
+                or (is_output and type == BaseCType(scalarT))
+            ):
+                # note(crcrpar): Here `expr` is generated from scratch, `arg.expr` is ignored.
+                var = name
+                name += "_"
+                if var == "self" and inplace:
+                    original_self_var = (
+                        "original_self"
+                        if not is_inplace_foreach
+                        else "original_selfs[i]"
+                    )
+                    self_var = var if not is_inplace_foreach else var + "[i]"
+                    stmts_prepend = f"if (!{original_self_var}.has_value()) {original_self_var} = {self_var}.clone()"
+                    var = f"{original_self_var}.value()"
+                    assert not is_output
+                if inplace and is_output:
+                    assert name == "result_"
+                    var = (
+                        "self[i]"
+                        if is_inplace_foreach or is_foreacharg_list_type
+                        else "self"
+                    )
+                    is_inplace_view = f"{var}.is_view()"
+                    expr = f"SavedVariable({var}, {str(is_output).lower()}, {is_inplace_view})"
+                else:
+                    expr = f"SavedVariable({var}, {str(is_output).lower()})"
+                    if foreacharg is not None and "original_selfs" not in expr:
+                        # pyrefly: ignore [unbound-name]
+                        expr = expr.replace(src_name, name_in_expr)
+            elif (
+                type == BaseCType(tensorListT)
+                or type == ListCType(OptionalCType(BaseCType(tensorT)))
+                or type == BaseCType(iTensorListRefT)
+                or type == VectorCType(BaseCType(tensorT))
+            ):
+                # See Note [nuanced return type of out-of-place foreach functions]
+                if type == VectorCType(BaseCType(tensorT)):
+                    assert is_foreach and is_output
+                expr = f"make_saved_variable_list({name}, {str(is_foreach and is_output).lower()})"
+                name += "_"
+            elif type == BaseCType(intArrayRefT):
+                expr = expr + ".vec()"
+            elif type == BaseCType(symIntArrayRefT):
+                expr = expr + ".vec()"
+            elif type == BaseCType(stringT):
+                expr = f"std::string({expr})"
+            elif type == OptionalCType(BaseCType(stringT)):
+                expr = f"{expr}.has_value() ? ::std::optional<std::string>(std::string({expr}.value())) : ::std::nullopt"
+            elif type == ArrayRefCType(
+                elem=BaseCType(type=BaseCppType(ns="at", name="Scalar"))
+            ):
+                expr = expr + ".vec()"
+
+            guard = guard_for(arg)
+            if guard is None:
+                if stmts_prepend:
+                    stmts.append(f"{stmts_prepend};")
+                stmts.append(f"grad_fn->{name} = {expr};")
+            else:
+                stmts.append(f"if ({guard}) {{")
+                if stmts_prepend:
+                    stmts.append(f"  {stmts_prepend};")
+                stmts.append(f"  grad_fn->{name} = {expr};")
+                stmts.append("}")
+        return stmts
+
+    # Generates a Dispatcher::redispatch() call into the dispatcher. We do this mainly for performance reasons:
+    #  - Pre-compute the full DispatchKeySet. This saves the dispatcher from having to read from TLS.
+    #  - redispatch() avoids a redundant call to RecordFunction, which was already called right before
+    #    we entered this autograd kernel.
+    def emit_dispatch_call(
+        f: NativeFunction, input_base: str, unpacked_args: Sequence[str]
+    ) -> str:
+        """Dispatch call via function in a namespace or method on Tensor."""
+        # code-generated autograd kernels plumb and recompute dispatch keys directly through the kernel for performance.
+        # Ops also always have a function variant of the redispatch API.
+        # See Note [Plumbing Keys Through The Dispatcher] for details.
+        dispatch_key_set = "ks & c10::after_autograd_keyset"
+        call = CALL_REDISPATCH.substitute(
+            api_name=cpp.name(
+                f.func,
+                faithful_name_for_out_overloads=True,
+                symint_overload=f.func.has_symint(),
+            ),
+            unpacked_args=[dispatch_key_set] + list(unpacked_args),
+        )
+        return call
+
+    def wrap_output(
+        f: NativeFunction, unpacked_bindings: list[Binding], var: str
+    ) -> str:
+        call = ""
+        rhs_value: str | None = None
+        if not any(r.type.is_tensor_like() for r in f.func.returns):
+            rhs_value = var
+        else:
+            rhs_value = f"std::move({var})"
+        assert rhs_value is not None
+        call += ASSIGN_RETURN_VALUE.substitute(
+            return_values=tie_return_values(f), rhs_value=rhs_value
+        )
+        return call
+
+    def check_tensorimpl_and_storage(
+        call: str, unpacked_bindings: list[Binding]
+    ) -> str:
+        # See NOTE [ TensorImpl and Storage Pointer Sanity Checks ]
+        stmts_before_call: list[str] = []
+        stmts_after_call: list[str] = []
+
+        if cpp.name(f.func) in DONT_ENFORCE_SAME_TENSOR_IMPL_OR_STORAGE:
+            return call
+
+        # Check properties of inputs (enforce (1))
+        for unpacked_binding in unpacked_bindings:
+            arg = unpacked_binding.name
+            noref_cpp_type = unpacked_binding.nctype.type.remove_const_ref()
+            if noref_cpp_type == BaseCType(tensorListT) or noref_cpp_type == BaseCType(
+                iTensorListRefT
+            ):
+                stmts_before_call += [
+                    SAVE_TENSORLIST_STORAGE.substitute(tensorlist_name=arg),
+                    SAVE_TENSORLIST_IMPL.substitute(tensorlist_name=arg),
+                ]
+                stmts_after_call += [
+                    ENFORCE_SAME_TENSORLIST_STORAGE.substitute(tensorlist_name=arg),
+                    ENFORCE_SAME_TENSORLIST_IMPL.substitute(tensorlist_name=arg),
+                ]
+            elif noref_cpp_type == ListCType(OptionalCType(BaseCType(tensorT))):
+                stmts_before_call += [
+                    SAVE_OPTIONALTENSORLIST_STORAGE.substitute(tensorlist_name=arg),
+                    SAVE_OPTIONALTENSORLIST_IMPL.substitute(tensorlist_name=arg),
+                ]
+                stmts_after_call += [
+                    ENFORCE_SAME_OPTIONALTENSORLIST_STORAGE.substitute(
+                        tensorlist_name=arg
+                    ),
+                    ENFORCE_SAME_OPTIONALTENSORLIST_IMPL.substitute(
+                        tensorlist_name=arg
+                    ),
+                ]
+            elif noref_cpp_type == BaseCType(tensorT):
+                stmts_before_call += [
+                    SAVE_TENSOR_STORAGE.substitute(tensor_name=arg),
+                    SAVE_TENSOR_IMPL.substitute(tensor_name=arg),
+                ]
+                stmts_after_call += [
+                    ENFORCE_SAME_TENSOR_STORAGE.substitute(
+                        tensor_name=arg, out_tensor_name=arg
+                    ),
+                    ENFORCE_SAME_TENSOR_IMPL.substitute(tensor_name=arg),
+                ]
+
+        assert (stmts_before_call and stmts_after_call) or (
+            not stmts_before_call and not stmts_after_call
+        )
+
+        # Check properties of outputs (enforce (2), (3))
+        if f.func.kind() not in (SchemaKind.inplace, SchemaKind.out):
+            base_name = f.func.name.name.base  # TODO: should be str(f.func.name.name)?
+            aliased_arg_name = ALL_VIEW_FUNCTIONS.get(base_name, None)
+            if aliased_arg_name is not None:
+                aliased_arg_name = unpacked_name(aliased_arg_name)
+            for i, (ret, ret_name) in enumerate(
+                zip(f.func.returns, cpp.return_names(f))
+            ):
+                noref_cpp_type = cpp.return_type(ret, symint=True).remove_const_ref()
+                if noref_cpp_type == BaseCType(tensorT):
+                    if aliased_arg_name is not None:
+                        assert i == 0, (
+                            "Expect non-CompositeImplicitAutograd view function {base} to return single output"
+                        )
+                        stmts_after_call += [
+                            ENFORCE_SAME_TENSOR_STORAGE.substitute(
+                                tensor_name=aliased_arg_name, out_tensor_name=ret_name
+                            )
+                        ]
+                    else:
+                        if (
+                            type_wrapper_name(f)
+                            not in DONT_ENFORCE_STORAGE_IMPL_USE_COUNT
+                        ):
+                            stmts_after_call += [
+                                ENFORCE_TENSOR_STORAGE_USE_COUNT_EQUALS_ONE.substitute(
+                                    tensor_name=ret_name, fn_name=type_wrapper_name(f)
+                                )
+                            ]
+
+                    if type_wrapper_name(f) not in DONT_ENFORCE_TENSOR_IMPL_USE_COUNT:
+                        stmts_after_call += [
+                            ENFORCE_TENSOR_IMPL_USE_COUNT.substitute(
+                                tensor_name=ret_name, fn_name=type_wrapper_name(f)
+                            )
+                        ]
+
+                # Currently we don't have any functions that return the following types, but
+                # we should update the checks once we do
+                elif noref_cpp_type == ListCType(OptionalCType(BaseCType(tensorT))):
+                    raise AssertionError(
+                        f"Please add use_count checks for {noref_cpp_type}"
+                    )
+                elif noref_cpp_type == BaseCType(tensorListT):
+                    raise AssertionError(
+                        f"Please add use_count checks for {noref_cpp_type}"
+                    )
+
+        if stmts_before_call and stmts_after_call:
+            call = (
+                RUN_ONLY_IN_DEBUG_MODE.substitute(statements=stmts_before_call)
+                + call
+                + RUN_ONLY_IN_DEBUG_MODE.substitute(statements=stmts_after_call)
+            )
+        return call
+
+    def emit_call(
+        f: NativeFunction, unpacked_bindings: list[Binding], try_jit_decomposition: bool
+    ) -> str:
+        # We only care about adding `at::AutoDispatchBelowAutograd` guard for non-variable dispatch
+        # (which corresponds to 'use_derived' strategy). The purpose of this guard is to make sure
+        # the baseType operations still dispatch to non-Variable type, even if the arguments passed
+        # in are now Variables.
+        # See NOTE [ Treating Variables as non-Variables in type dispatch ] for details.
+        unpacked_args = [b.name for b in unpacked_bindings]
+        base_type_call = emit_dispatch_call(f, "self_", unpacked_args)
+
+        if get_view_info(f) is not None or modifies_arguments(f):
+            guard = "at::AutoDispatchBelowAutograd guard;"
+        else:
+            guard = "at::AutoDispatchBelowADInplaceOrView guard;"
+
+        any_has_forward_grad = (
+            get_any_has_fw_grad_cond(derivative=None)
+            if requires_derivative
+            else "false"
+        )
+        return_types = ", ".join(
+            [cpp.return_type(a, symint=True).cpp_type() for a in f.func.returns]
+        )
+        if len(f.func.returns) > 1:
+            return_types = f"std::tuple<{return_types}>"
+
+        arg_names = [
+            a.name
+            for a in cpp.arguments(
+                f.func.arguments,
+                faithful=True,
+                symint=True,
+                method=False,
+                cpp_no_default_args=set(),
+            )
+        ]
+
+        if not modifies_arguments(f) and not returns_void:
+            if try_jit_decomposition:
+                call = DISPATCH_TO_NON_VAR_TYPE_WITH_TMP_RETURN_VALUES_JVP_DECOMP.substitute(
+                    base_type_call=base_type_call,
+                    tmp_var=TMP_VAR,
+                    guard=guard,
+                    any_has_forward_grad=any_has_forward_grad,
+                    op_name=cpp.name(f.func),
+                    op_overload=f.func.name.overload_name,
+                    return_types=return_types,
+                    arg_names=arg_names,
+                )
+            else:
+                call = DISPATCH_TO_NON_VAR_TYPE_WITH_TMP_RETURN_VALUES.substitute(
+                    base_type_call=base_type_call,
+                    tmp_var=TMP_VAR,
+                    guard=guard,
+                )
+
+            call += wrap_output(f, unpacked_bindings, TMP_VAR)
+        else:
+            assert not try_jit_decomposition
+            call = DISPATCH_TO_NON_VAR_TYPE_WITHOUT_RETURN_VALUES.substitute(
+                base_type_call=base_type_call, guard=guard
+            )
+        call = check_tensorimpl_and_storage(call, unpacked_bindings)
+        return call
+
+    def emit_history() -> str:
+        fn = "rebase" if modifies_arguments(f) and view_info is None else "set"
+        output_names = [r.name for r in differentiable_outputs]
+        # TODO: flatten allocates a std::vector, which could be expensive
+        outs = CodeTemplate("flatten_tensor_args( ${outs} )").substitute(
+            outs=output_names if not is_inplace_foreach else "self"
+        )
+        if not is_inplace_foreach:
+            return SET_HISTORY.substitute(fn=fn, differentiable_outputs=outs)
+        else:
+            return LOOP_OVER_VECTOR_OF_GRAD_FNS.substitute(
+                preamble=(
+                    f"auto differentiable_outputs = {outs};\n"
+                    f"TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size());"
+                ),
+                statements=f"{fn}_history(differentiable_outputs[i], grad_fns[i]);",
+            )
+
+    def emit_save_outputs() -> str:
+        if is_out_fn:
+            # out functions don't currently support differentiation
+            return ""
+        if info is not None and info.has_derivatives:
+            stmts = save_variables(info.all_saved_outputs, True)
+            if len(stmts) == 0:
+                return ""
+            if not is_inplace_foreach:
+                return CONDITIONAL.substitute(cond="grad_fn", statements=stmts)
+            else:
+                return LOOP_OVER_VECTOR_OF_GRAD_FNS.substitute(
+                    preamble="", statements=stmts
+                )
+        return ""
+
+    def emit_any_requires_grad() -> list[str]:
+        extra_condition = ""
+        if info and info.output_differentiability_conditions:
+            assert len(info.output_differentiability_conditions) == 1
+            extra_condition = f"_any_requires_grad &= ({info.output_differentiability_conditions[0]});"
+        names_of_args_with_derivatives = [arg.name for arg in args_with_derivatives]
+        if is_inplace_foreach and info is not None:
+            for i, arg in enumerate(names_of_args_with_derivatives):
+                for f_arg, r_arg in inplace_foreacharg2refarg.items():
+                    if arg == r_arg.name:
+                        names_of_args_with_derivatives[i] = f_arg.name
+        return [
+            SETUP_ANY_REQUIRES_GRAD.substitute(
+                args_with_derivatives=names_of_args_with_derivatives,
+                extra_differentiability_conditions=extra_condition,
+            )
+        ]
+
+    def get_any_has_forward_grad_name(var_names: tuple[str, ...]) -> str:
+        if len(var_names) == 1:
+            return f"_any_has_forward_grad_{var_names[0]}"
+        else:
+            return f"_any_has_forward_grad_{'_'.join(var_names)}"
+
+    def emit_any_has_forward_grad() -> list[str]:
+        content: list[str] = []
+        if not is_foreach:
+            for derivative in fw_derivatives:
+                requires_fw_grad = get_any_has_fw_grad_cond(derivative=derivative)
+                if info and info.output_differentiability_conditions:
+                    assert len(info.output_differentiability_conditions) == 1
+                    requires_fw_grad = f"({info.output_differentiability_conditions[0]}) && {requires_fw_grad}"
+                content.append(
+                    f"[[maybe_unused]] auto {get_any_has_forward_grad_name(derivative.var_names)} = {requires_fw_grad};"
+                )
+        else:
+            for derivative in fw_derivatives:
+                bool_vector_name = get_any_has_forward_grad_name(derivative.var_names)
+                cur_derivative_conditions = []
+                for inp in differentiable_inputs:
+                    if derivative.required_inputs_fw_grad is None:
+                        continue
+                    if inp.name not in derivative.required_inputs_fw_grad:
+                        continue
+                    inp_name = (
+                        inp.name
+                        if not inplace
+                        else refargname2inplace_foreacharg[inp.name].name
+                    )
+                    inp_type = (
+                        inp.type
+                        if not inplace
+                        else refargname2inplace_foreacharg[inp.name].type
+                    )
+                    is_list_type = is_tensor_list_type(inp_type)
+                    if is_list_type:
+                        if inp_name != "self":
+                            content.append(
+                                FW_DERIVATIVE_SIZE_CHECK_TEMPLATE.substitute(
+                                    inp_name=inp_name
+                                )
+                            )
+                        cur_derivative_conditions.append(
+                            # pyrefly: ignore [bad-argument-type]
+                            FW_DERIVATIVE_CHECK_TEMPLATE.substitute(
+                                req_inp=inp_name + "[i]"
+                            )
+                        )
+                    else:
+                        cur_derivative_conditions.append(
+                            # pyrefly: ignore [bad-argument-type]
+                            FW_DERIVATIVE_CHECK_TEMPLATE.substitute(req_inp=inp_name)
+                        )
+
+                content.append(f"std::vector<bool> {bool_vector_name}(self.size());")
+                content.append("for (const auto& i : c10::irange(self.size())) {")
+                content.append(
+                    f"  {bool_vector_name}[i] = {' || '.join(cur_derivative_conditions)};"
+                )
+                content.append("}")
+        return content
+
+    def emit_check_inplace() -> list[str]:
+        if not inplace:
+            return []
+        return [
+            f"check_inplace({arg.name}, _any_requires_grad);"
+            for arg in differentiable_outputs
+        ]
+
+    def emit_fw_derivatives() -> list[str]:
+        content: list[str] = []
+        fw_grad_setters: list[str] = []
+        for derivative in fw_derivatives:
+            res = derivative.var_names
+            if f.func.name.name.inplace:
+                assert len(res) == 1, (
+                    "Expected number of outputs to be 1 if function is inplace"
+                )
+                # TODO update this when inplace namings are unified
+                res = ("self",)
+
+            assert derivative.required_inputs_fw_grad is not None
+
+            unpacked_arguments = ""
+            for inp in differentiable_inputs:
+                inp_name = inp.name
+                is_input_tensorlist = is_foreach and is_tensor_list_type(
+                    inp.type
+                    if not inplace
+                    else refargname2inplace_foreacharg[inp.name].type
+                )
+                input_suffix = "[i]" if is_input_tensorlist else ""
+                if is_inplace_foreach:
+                    if inp.name in refargname2inplace_foreacharg:
+                        inp_name = refargname2inplace_foreacharg[inp.name].name
+                zeros_fn = (
+                    "zeros_symint"
+                    if inplace and inp.name == "self"
+                    else "_efficientzerotensor_symint"
+                )
+                if inp.name in derivative.required_inputs_fw_grad:
+                    unpacked_arguments += (
+                        FW_DERIVATIVE_DEFINED_GRAD_TEMPLATE.substitute(
+                            inp_name=inp.name,
+                            inp=inp_name + input_suffix,
+                            zeros_fn=zeros_fn,
+                        )
+                    )
+                    if zeros_fn == "_efficientzerotensor_symint":
+                        unpacked_arguments += (
+                            FW_DERIVATIVE_UPDATE_WRAPPED_NUM_TEMPLATE.substitute(
+                                inp_name=inp.name
+                            )
+                        )
+
+                if inp.name in (derivative.required_inputs_primal or []):
+                    unpacked_arguments += (
+                        FW_DERIVATIVE_DEFINED_PRIMAL_TEMPLATE.substitute(
+                            inp_name=inp.name,
+                            inp=inp_name + input_suffix,
+                        )
+                    )
+            if derivative.required_original_self_value:
+                input_suffix = "s[i]" if is_inplace_foreach else ""
+                unpacked_arguments += FW_DERIVATIVE_DEFINED_GRAD_TEMPLATE.substitute(
+                    inp_name="original_self",
+                    inp="original_self" + input_suffix,
+                    # pyrefly: ignore [unbound-name]
+                    zeros_fn=zeros_fn,
+                )
+                unpacked_arguments += FW_DERIVATIVE_DEFINED_PRIMAL_TEMPLATE.substitute(
+                    inp_name="original_self",
+                    inp="original_self" + input_suffix,
+                )
+            elif inplace and derivative.is_reusing_outplace_formula:
+                # The gradient wasn't already cloned, do it if grad mode is enabled
+                unpacked_arguments += (
+                    "self_t = GradMode::is_enabled() ? self_t.clone() : self_t;"
+                )
+
+            if inplace:
+                is_inplace_str = "true"
+            else:
+                is_inplace_str = "false"
+
+            requires_fw_grad = get_any_has_forward_grad_name(derivative.var_names)
+
+            if all(
+                (isinstance(var_type, BaseType) and var_type.is_tensor_like())
+                for var_type in derivative.var_types
+            ):
+                # Is there a way to get from BaseType to BaseCType
+                if len(derivative.var_types) == 1:
+                    opt_res_grad_type = OptionalCType(BaseCType(tensorT)).cpp_type()
+                    if not is_foreach:
+                        fw_grad_setters.append(
+                            FW_DERIVATIVE_SETTER_TENSOR.substitute(
+                                out_arg=res[0], is_inplace=is_inplace_str
+                            )
+                        )
+                    else:
+                        assert res[0] == ("result" if not inplace else "self")
+                        fw_grad_setters.append(
+                            FW_DERIVATIVE_SETTER_TENSOR_FOREACH.substitute(
+                                out_arg=res[0], is_inplace=is_inplace_str
+                            )
+                        )
+                    requires_fw_grad += f" && ({derivative.var_names[0]}.defined())"
+                else:
+                    tuple_type = TupleCType(
+                        [BaseCType(tensorT)] * len(derivative.var_types)
+                    )
+                    opt_res_grad_type = OptionalCType(tuple_type).cpp_type()
+                    for idx, single_res in enumerate(res):
+                        fw_grad_setters.append(
+                            FW_DERIVATIVE_SETTER_MULTI_OUTPUT.substitute(
+                                idx=idx, all_res="_".join(res), out_arg=single_res
+                            )
+                        )
+            elif (
+                isinstance(derivative.var_types[0], ListType)
+                and derivative.var_types[0].is_tensor_like()
+            ):
+                assert len(derivative.var_types) == 1, (
+                    "Expected number of outputs to be 1 if function returns ListType"
+                )
+                if not is_foreach:
+                    opt_res_grad_type = OptionalCType(
+                        VectorCType(BaseCType(tensorT))
+                    ).cpp_type()
+                    fw_grad_setters.append(
+                        FW_DERIVATIVE_SETTER_TENSOR_LIST.substitute(
+                            out_arg=res[0], is_inplace=is_inplace_str
+                        )
+                    )
+                else:
+                    # TODO(crcrpar): Should this (= the foreach specific logic) be refactored somehow?
+                    # Only out-place foreach functions that have entries in `tools/autograd/derivatives.yaml`
+                    # can reach here.
+                    opt_res_grad_type = OptionalCType(BaseCType(tensorT)).cpp_type()
+                    fw_grad_setters.append(
+                        FW_DERIVATIVE_SETTER_TENSOR_FOREACH.substitute(
+                            out_arg=res[0], is_inplace=is_inplace_str
+                        )
+                    )
+            else:
+                raise RuntimeError("Unsupported output type for forward derivative")
+
+            if not is_foreach:
+                fw_grad_opt_definition = f"{opt_res_grad_type} {'_'.join(res)}_new_fw_grad_opt = ::std::nullopt;"
+                # View ops create fw_grad that already is a view of the base's fw_grad so just use that
+                content.append(
+                    FW_DERIVATIVE_TEMPLATE.substitute(
+                        fw_grad_opt_definition=fw_grad_opt_definition,
+                        requires_fw_grad=requires_fw_grad,
+                        formula=derivative.formula,
+                        out_arg="_".join(res),
+                        unpacked_arguments=unpacked_arguments,
+                    )
+                )
+            else:
+                # note(crcrpar): Assuming `self` is TensorList.
+                fw_grad_opt_definition = (
+                    f"std::vector<{opt_res_grad_type}> {'_'.join(res)}_new_fw_grad_opts"
+                    "(self.size(), ::std::nullopt);"
+                )
+                foreach_forward_grad_formula = derivative.formula
+                _foreach_arg: Argument | DifferentiableInput
+                if inplace:
+                    for _foreach_arg, _ref_arg in inplace_foreacharg2refarg.items():
+                        # note(crcrpar): Massage only Scalar and ArrayRef<Scalar> here.
+                        if not (
+                            is_tensor_type(_foreach_arg.type)
+                            or is_tensor_list_type(_foreach_arg.type)
+                        ):
+                            pattern = _foreach_arg.name
+                            if isinstance(_foreach_arg.type, ListType):
+                                pattern += "[i]"
+                            foreach_forward_grad_formula = (
+                                foreach_forward_grad_formula.replace(
+                                    _ref_arg.name, pattern
+                                )
+                            )
+                else:
+                    if (
+                        "result" in foreach_forward_grad_formula
+                        and "result[i]" not in foreach_forward_grad_formula
+                    ):
+                        foreach_forward_grad_formula = (
+                            foreach_forward_grad_formula.replace("result", "result[i]")
+                        )
+
+                content.append(
+                    FW_DERIVATIVE_FOREACH_TEMPLATE.substitute(
+                        fw_grad_opt_definition=fw_grad_opt_definition,
+                        vector_of_optional_tensor=f"{'_'.join(res)}_new_fw_grad_opts",
+                        any_has_forward_grad_for_current_index=" || ".join(
+                            get_any_has_forward_grad_name(derivative.var_names) + "[i]"
+                            for derivative in fw_derivatives
+                        ),
+                        formula=foreach_forward_grad_formula,
+                        unpacked_arguments=unpacked_arguments,
+                    )
+                )
+
+        # Set all the grads at the end to avoid: https://github.com/pytorch/pytorch/issues/67367
+        content.append("\n".join(fw_grad_setters))
+        return content
+
+    def get_any_has_fw_grad_cond(derivative: ForwardDerivative | None) -> str:
+        #
+        # Produces a condition string (e.g, "isFwGradDefined(grad_output) || isFwGradDefined(output)")
+        #
+        if derivative is None:
+            # (1) If a derivative is NOT provided, cond will check fw_grad of ALL differentiable inputs
+            # - Used in the out_fn case when we want to forbid fw derivatives
+            # - Used in the case where the fw_derivative is not defined, but we want
+            #   To check if there is a decomposition registered for jvp
+            to_check: list[str] = []
+            for inp in list(
+                mapMaybe(
+                    gen_differentiable_input,
+                    f.func.arguments.non_out + list(f.func.arguments.out),  # type: ignore[operator]
+                )
+            ):
+                if is_tensor_type(inp.type):
+                    to_check.append(
+                        FW_DERIVATIVE_CHECK_TEMPLATE.substitute(req_inp=inp.name)
+                    )
+                elif is_tensor_list_type(inp.type):
+                    to_check.append(
+                        FW_DERIVATIVE_TENSORLIST_CHECK_TEMPLATE.substitute(
+                            req_inp=inp.name
+                        )
+                    )
+                else:
+                    raise RuntimeError(
+                        f'Unsupported input type for "{name}" when forbidding forward AD usage.'
+                    )
+            return f"({' || '.join(to_check)})"
+        else:
+            # (2) If derivative is provided, use that information to determine which inputs
+            #     to check fw_grad for
+            assert derivative.required_inputs_fw_grad is not None
+
+            if len(derivative.required_inputs_fw_grad) == 0:
+                # Handle functions like stack
+                # For these, we don't unpack anything and always call the user function
+                if not (
+                    len(differentiable_inputs) == 1
+                    and is_tensor_list_type(differentiable_inputs[0].type)
+                ):
+                    raise RuntimeError(
+                        f'No differentiable input to "{name}" is a differentiable Tensor (as the provided '
+                        "forward AD formula does not use any input tangent) even though a forward gradient "
+                        "formula has been defined for it. This case should only happen for function that "
+                        "take a single TensorList as input. All other cases are not supported right now."
+                    )
+                any_has_fw_grad = "true"
+            else:
+                any_has_fw_grad = " || ".join(
+                    [
+                        (
+                            FW_DERIVATIVE_TENSORLIST_CHECK_TEMPLATE
+                            if is_tensor_list_type(inp.type)
+                            else FW_DERIVATIVE_CHECK_TEMPLATE
+                        ).substitute(req_inp=inp.name)
+                        for inp in differentiable_inputs
+                        if inp.name in derivative.required_inputs_fw_grad
+                    ]
+                )
+                any_has_fw_grad = f"({any_has_fw_grad})"
+
+            return any_has_fw_grad
+
+    def emit_forbid_fw_derivatives(is_out_fn: bool = False) -> str:
+        if is_out_fn:
+            msg = "because it is an out= function"
+        else:
+            msg = (
+                "because it has not been implemented yet.\\nPlease file an issue "
+                "to PyTorch at https://github.com/pytorch/pytorch/issues/new?template=feature-request.yml "
+                "so that we can prioritize its implementation."
+            )
+        cond = get_any_has_fw_grad_cond(derivative=None)
+        return (
+            FW_DERIVATIVE_FORBID_TEMPLATE.substitute(cond=cond, name=name, msg=msg)
+            if cond != ""
+            else ""
+        )
+
+    body: list[str] = []
+    unpack_args_stats, unpacked_bindings = unpack_args(f)
+
+    body.extend(unpack_args_stats)
+    if requires_derivative:
+        body.extend(emit_any_requires_grad())
+        body.extend(emit_any_has_forward_grad())
+        body.extend(emit_check_inplace())
+        body.extend(emit_original_self_definition())
+        body.extend(setup_derivative(differentiable_inputs))
+
+    body.append(emit_call(f, unpacked_bindings, try_jit_decomposition))
+    if requires_derivative:
+        # set_flags has to appear after version_counter, because rebase_history
+        # requires that the counter is incremented before it is called
+        body.append(emit_history())
+        body.extend(emit_check_if_in_complex_autograd_allowlist())
+
+    if is_out_fn:
+        body.append(emit_forbid_fw_derivatives(is_out_fn=True))
+    else:
+        if requires_derivative and not try_jit_decomposition:
+            if len(fw_derivatives) > 0:
+                body.extend(emit_fw_derivatives())
+            else:
+                body.append(emit_forbid_fw_derivatives())
+
+    if requires_derivative:
+        # Save only after the forward AD has been set up
+        body.append(emit_save_outputs())
+
+    if str(f.func.name.name) in RESET_GRAD_ACCUMULATOR:
+        # `inplace` implies that there is exactly one output named `self`,
+        # so we can keep the generated code easy. If you need to
+        # `reset_grad_accumulator` in an operator that's not `inplace`, you can
+        # remove this assert but the code generation will get more elaborate
+        assert inplace
+        body.append("reset_grad_accumulator(self);")
+    if not returns_void:
+        body.append(f"return {get_return_value(f)};")
+    return body
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_view_funcs.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_view_funcs.py
new file mode 100644
index 0000000000000000000000000000000000000000..8cc8a2ffcecc4571c5101a265be3a5eeb766473a
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/gen_view_funcs.py
@@ -0,0 +1,339 @@
+# Generates ViewFuncs.h/cpp
+#
+# NOTE: If any changes are being made to the ViewFunc codegen please also check
+# if updates are needed in torch/csrc/autograd/autograd_not_implemented_fallback.cpp
+# The fallback is expected to mimic this codegen, so we should keep the two in sync.
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+import torchgen.api.dispatcher as dispatcher
+from torchgen.api.translate import translate
+from torchgen.api.types import (
+    BaseCType,
+    Binding,
+    NamedCType,
+    SymIntT,
+    tensorT,
+    VectorCType,
+)
+from torchgen.code_template import CodeTemplate
+from torchgen.model import Argument, NativeFunction, OptionalType
+from torchgen.utils import FileManager
+
+from .gen_inplace_or_view_type import (
+    CALL_DISPATCH,
+    extract_bindings,
+    get_view_info,
+    modifies_arguments,
+    use_derived,
+)
+
+
+if TYPE_CHECKING:
+    from torchgen.api.autograd import NativeFunctionWithDifferentiabilityInfo
+
+
+FUNCTION_DECLARATION = CodeTemplate(
+    """\
+#define ${uppercase_op}_AVAILABLE
+struct ${op} : public ${superclass} {
+  ${op}(${constructor_args}) ${initializer_list}
+  {}
+  virtual ~${op}() override = default;
+  virtual std::vector<c10::SymInt> get_symints() const override;
+  virtual size_t num_symints() const override;
+  virtual std::vector<at::Tensor> get_tensors() const override;
+  virtual size_t num_tensors() const override;
+  virtual at::Tensor operator()(const at::Tensor&) const override;
+  virtual std::unique_ptr<ViewFunc> clone_and_set(
+      std::optional<std::vector<c10::SymInt>> = ::std::nullopt,
+      std::optional<std::vector<at::Tensor>> = ::std::nullopt) const override;
+
+protected:
+  virtual void set_symints(std::vector<c10::SymInt>) override;
+  virtual void set_tensors(std::vector<at::Tensor>) override;
+
+private:
+  ${state}
+};
+
+"""
+)
+
+FUNCTION_DEFINITION = CodeTemplate(
+    """\
+std::vector<c10::SymInt> ${op}::get_symints() const {
+  ${get_symints}
+}
+
+size_t ${op}::num_symints() const {
+  return static_cast<size_t>(${num_symints});
+}
+
+void ${op}::set_symints(std::vector<c10::SymInt> ${symints_vec}) {
+  TORCH_INTERNAL_ASSERT(${symints_vec}.size() == num_symints());
+  ${set_symints}
+}
+
+std::vector<at::Tensor> ${op}::get_tensors() const {
+  ${get_tensors}
+}
+
+size_t ${op}::num_tensors() const {
+  return static_cast<size_t>(${num_tensors});
+}
+
+void ${op}::set_tensors(std::vector<at::Tensor> ${tensors_vec}) {
+  TORCH_INTERNAL_ASSERT(${tensors_vec}.size() == num_tensors());
+  ${set_tensors}
+}
+
+at::Tensor ${op}::operator()(const at::Tensor& ${call_input_name}) const {
+  return ${op_call};
+}
+
+std::unique_ptr<ViewFunc> ${op}::clone_and_set(
+    std::optional<std::vector<c10::SymInt>> ${symints_vec},
+    std::optional<std::vector<at::Tensor>> ${tensors_vec}) const {
+  auto output = std::make_unique<${op}>(${clone_args});
+  if (${symints_vec}.has_value()) {
+    output->set_symints(std::move(*(${symints_vec})));
+  }
+  if (${tensors_vec}.has_value()) {
+    output->set_tensors(std::move(*(${tensors_vec})));
+  }
+  return output;
+}
+
+"""
+)
+
+
+# e.g. as_strided -> AsStridedViewFunc for camel case or
+# as_strided_view_func otherwise
+def view_func_name(
+    f: NativeFunction, include_namespace: bool = False, camel_case: bool = True
+) -> str:
+    name = f.func.name.unambiguous_name()
+    view_func_name = f"{name.replace('.', '_')}_view_func"
+    if camel_case:
+        is_private = view_func_name.startswith("_")
+        view_func_name = "".join(
+            [p.title() for p in view_func_name.replace(".", "_").split("_")]
+        )
+        if is_private:
+            # put the leading underscore back in
+            view_func_name = f"_{view_func_name}"
+    namespace = "torch::autograd::generated::" if include_namespace else ""
+    return f"{namespace}{view_func_name}"
+
+
+def is_symint_or_tensor(arg: Argument) -> bool:
+    return arg.type.is_tensor_like() or arg.type.is_symint_like()
+
+
+def remove_const_ref(binding: Binding) -> Binding:
+    return Binding(
+        name=binding.name,
+        nctype=binding.nctype.remove_const_ref(),
+        argument=binding.argument,
+        default=binding.default,
+    )
+
+
+def returns_multi_tensor(fn: NativeFunction) -> bool:
+    returns = fn.func.returns
+    assert len(returns) == 1
+    returns_list_like = returns[0].type.is_list_like() is not None
+    returns_tensor_like = returns[0].type.is_tensor_like()
+    return returns_list_like and returns_tensor_like
+
+
+# Generates strings with logic for getting / setting state of a particular type.
+#
+# Args:
+#   bindings (list): List of state bindings of interest (may be empty)
+#   state_vec_type (NamedCType): Type of vector to either return or copy from
+#
+# Returns:
+#   tuple: (list of getter logic strings, list of setter logic strings, string
+#     with num items expression)
+def generate_state_getter_setter(
+    bindings: list[Binding],
+    state_vec_type: NamedCType,
+) -> tuple[list[str], list[str], str]:
+    getter_logic = []
+    setter_logic = []
+
+    state_vec = state_vec_type.name
+    getter_logic.append(f"{state_vec_type.cpp_type()} {state_vec};")
+    if len(bindings) > 0:
+        setter_logic.append("auto i = 0;")
+
+    num_exprs = []
+    for i, b in enumerate(bindings):
+        assert isinstance(b.argument, Argument)
+        if b.argument.type.is_list_like():
+            # Handle list-likes.
+            num_expr = f"{b.name}.size()"
+            num_exprs.append(num_expr)
+            getter = f"{state_vec}.insert({state_vec}.end(), {b.name}.begin(), {b.name}.end());"
+            setter = f"std::copy({state_vec}.begin() + i, {state_vec}.begin() + i + {b.name}.size(), {b.name}.begin());"
+        elif isinstance(b.argument.type, OptionalType):
+            # Handle optionals.
+            num_expr = f"({b.name}.has_value() ? 1 : 0)"
+            num_exprs.append(num_expr)
+            conditional = f"if({b.name}.has_value())"
+            getter = (
+                f"{conditional} {state_vec}.insert({state_vec}.end(), *({b.name}));"
+            )
+            setter = f"{conditional} {b.name} = {state_vec}[i];"
+        else:
+            num_expr = "1"
+            num_exprs.append(num_expr)
+            getter = f"{state_vec}.push_back({b.name});"
+            setter = f"{b.name} = {state_vec}[i];"
+
+        getter_logic.append(getter)
+        setter_logic.append(setter)
+        if i < len(bindings) - 1:
+            setter_logic.append(f"i += {num_expr};")
+
+    # Reserve / assert based on the total number of items expression.
+    num_items = "0" if len(num_exprs) == 0 else " + ".join(num_exprs)
+    if len(bindings) > 0:
+        getter_logic.insert(1, f"{state_vec}.reserve({num_items});")
+
+    getter_logic.append(f"return {state_vec};")
+
+    return getter_logic, setter_logic, num_items
+
+
+def process_function(fn: NativeFunction, template: CodeTemplate) -> str:
+    bindings = extract_bindings(fn)
+    non_self_bindings = [b for b in bindings if b.name != "self"]
+
+    non_self_args = fn.func.arguments.flat_all[1:]
+    non_self_value_bindings = [
+        dispatcher.argument(a, remove_non_owning_ref_types=True) for a in non_self_args
+    ]
+
+    # Generate constructor / clone args for the generated struct.
+    constructor_args = [b.defn() for b in non_self_bindings]
+    clone_args = [b.name for b in non_self_bindings]
+
+    # Generate state variable declarations for the generated struct.
+    state_variables = [
+        f"{remove_const_ref(b).defn()};" for b in non_self_value_bindings
+    ]
+
+    # Generate initializer list expressions for the generated struct.
+    # allow_expensive_conversions=True because we need to store e.g. SymIntArrayRefs as
+    # vector<SymInt>s.
+    init_exprs = translate(
+        non_self_bindings, non_self_value_bindings, allow_expensive_conversions=True
+    )
+    initializers = []
+    for b, init_expr in zip(non_self_bindings, init_exprs):
+        name = b.nctype.name
+        assert isinstance(name, str)
+        initializers.append(f"{name}({init_expr.expr})")
+
+    # Generate call to underlying view op
+    call_input_name = "input_base"
+    op_call_args = [call_input_name, *(b.name for b in non_self_bindings)]
+    op_call = CALL_DISPATCH.substitute(
+        unambiguous_name=fn.func.name.unambiguous_name(),
+        unpacked_args=op_call_args,
+    )
+
+    # Multi-output views additionally require a view_idx for disambiguation.
+    if returns_multi_tensor(fn):
+        view_idx_name = "view_idx"
+        view_idx_typename = "int64_t"
+        view_idx_decl = f"{view_idx_typename} {view_idx_name}"
+        constructor_args.append(view_idx_decl)
+        clone_args.append(view_idx_name)
+        state_variables.append(f"{view_idx_decl};")
+        initializers.append(f"{view_idx_name}({view_idx_name})")
+        op_call += f"[{view_idx_name}]"
+
+    # Generate initializer list for the generated struct.
+    initializer_list = f": {', '.join(initializers)}" if len(initializers) > 0 else ""
+
+    # Generate getter / setter logic for any symints.
+    symint_bindings = [
+        b
+        for b in non_self_bindings
+        if isinstance(b.argument, Argument) and b.argument.type.is_symint_like()
+    ]
+    symints_vec_type = NamedCType("symints", VectorCType(BaseCType(SymIntT)))
+    get_symints, set_symints, num_symints = generate_state_getter_setter(
+        symint_bindings, symints_vec_type
+    )
+
+    # Generate getter / setter logic for any tensors.
+    tensor_bindings = [
+        b
+        for b in non_self_bindings
+        if isinstance(b.argument, Argument) and b.argument.type.is_tensor_like()
+    ]
+    tensors_vec_type = NamedCType("tensors", VectorCType(BaseCType(tensorT)))
+    get_tensors, set_tensors, num_tensors = generate_state_getter_setter(
+        tensor_bindings, tensors_vec_type
+    )
+
+    return template.substitute(
+        op=view_func_name(fn),
+        uppercase_op=view_func_name(fn, camel_case=False).upper(),
+        superclass="torch::autograd::ViewFunc",
+        initializer_list=initializer_list,
+        state=state_variables,
+        constructor_args=constructor_args,
+        clone_args=clone_args,
+        symints_vec=symints_vec_type.name,
+        get_symints=get_symints,
+        set_symints=set_symints,
+        num_symints=num_symints,
+        tensors_vec=tensors_vec_type.name,
+        get_tensors=get_tensors,
+        set_tensors=set_tensors,
+        num_tensors=num_tensors,
+        call_input_name=call_input_name,
+        op_call=op_call,
+    )
+
+
+def gen_view_funcs(
+    out: str,
+    fns_with_infos: list[NativeFunctionWithDifferentiabilityInfo],
+    template_path: str,
+) -> None:
+    # don't need the info parts, just the function
+    fns = [fn.func for fn in fns_with_infos if use_derived(fn)]
+    # only want out-of-place views
+    view_fns = [
+        fn for fn in fns if get_view_info(fn) is not None and not modifies_arguments(fn)
+    ]
+
+    declarations = [process_function(fn, FUNCTION_DECLARATION) for fn in view_fns]
+    definitions = [process_function(fn, FUNCTION_DEFINITION) for fn in view_fns]
+    ops_headers = [f"#include <ATen/ops/{fn.root_name}_ops.h>" for fn in view_fns]
+
+    file_basename = "ViewFuncs"
+    fm = FileManager(install_dir=out, template_dir=template_path, dry_run=False)
+    for suffix in [".h", ".cpp"]:
+        fname = file_basename + suffix
+        fm.write_with_template(
+            fname,
+            fname,
+            lambda: {
+                "generated_comment": "@"
+                + f"generated from {fm.template_dir_for_comments()}/{fname}",
+                "view_func_declarations": declarations,
+                "view_func_definitions": definitions,
+                "ops_headers": ops_headers,
+            },
+        )
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/load_derivatives.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/load_derivatives.py
new file mode 100644
index 0000000000000000000000000000000000000000..59669b42cd5d45643306f6fd83bf3adb73b6c288
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/load_derivatives.py
@@ -0,0 +1,1025 @@
+# Parses derivatives.yaml into autograd functions
+#
+# Each autograd function is represented by `DifferentiabilityInfo` containing
+# a list of `Derivative`. See `torchgen.api.autograd` for the data models.
+
+from __future__ import annotations
+
+import re
+from collections import Counter, defaultdict
+from typing import Any, TYPE_CHECKING
+
+import yaml
+
+from torchgen.api import cpp
+from torchgen.api.autograd import (
+    Derivative,
+    DifferentiabilityInfo,
+    ForwardDerivative,
+    SavedAttribute,
+)
+from torchgen.api.types import (
+    BaseCType,
+    Binding,
+    boolT,
+    CppSignatureGroup,
+    layoutT,
+    longT,
+    NamedCType,
+    OptionalCType,
+    scalarTypeT,
+    SpecialArgName,
+    stringT,
+    symIntArrayRefT,
+    SymIntT,
+    tensorGeometryT,
+    tensorOptionsT,
+    typeAndSizeT,
+    VectorCType,
+)
+from torchgen.context import with_native_function
+from torchgen.gen import get_grouped_by_view_native_functions, parse_native_yaml
+from torchgen.model import (
+    AUTOGRAD_KEYS,
+    FunctionSchema,
+    NativeFunction,
+    NativeFunctionsViewGroup,
+    OperatorName,
+    SchemaKind,
+    Type,
+    Variant,
+)
+from torchgen.utils import concatMap, IDENT_REGEX, split_name_params
+from torchgen.yaml_utils import YamlLoader
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+
+DerivativeRet = tuple[dict[FunctionSchema, dict[str, DifferentiabilityInfo]], set[str]]
+
+_GLOBAL_LOAD_DERIVATIVE_CACHE: dict[tuple[str, str], DerivativeRet] = {}
+
+_VALID_AUTOGRAD_KEYS = set(AUTOGRAD_KEYS)
+
+
+# This function directly adds per-dispatchkey derivative entries for {view}_copy variants of each view op.
+# Since every {view} and {view}_copy op shares the same derivative formula,
+# we generate them here instead of duplicating them in the yaml.
+# See Note [Codegen'd {view}_copy Operators]
+def add_view_copy_derivatives(
+    infos: dict[FunctionSchema, dict[str, DifferentiabilityInfo]],
+    view_groups: list[NativeFunctionsViewGroup],
+) -> None:
+    # Get the map from each view op's name to its corresponding view group
+    view_name_to_group: dict[OperatorName, NativeFunctionsViewGroup] = {
+        g.view.func.name: g for g in view_groups
+    }
+
+    view_infos = {}
+
+    for info_dispatch_dict in infos.values():
+        # maybe_view_group only needs to be calculated once per info_dispatch_dict
+        maybe_view_group = None
+        view_copy_differentiability_infos = {}
+        for dispatch_key, info in info_dispatch_dict.items():
+            maybe_view_group = view_name_to_group.get(info.func.func.name, None)
+            if maybe_view_group is not None and maybe_view_group.view_copy is not None:
+                view_copy_info = info.create_view_copy_from_view_derivative(
+                    maybe_view_group
+                )
+                if view_copy_info is not None:
+                    fn_schema = view_copy_info.func.func
+                    view_copy_differentiability_infos[dispatch_key] = view_copy_info
+            else:
+                break
+        # prefer manually-defined derivatives if any
+        # pyrefly: ignore [unbound-name]
+        if len(view_copy_differentiability_infos) > 0 and fn_schema not in infos:
+            # pyrefly: ignore [unbound-name]
+            assert fn_schema is not None
+            # pyrefly: ignore [unbound-name]
+            view_infos[fn_schema] = view_copy_differentiability_infos
+
+    infos.update(view_infos)
+
+
+def load_derivatives(
+    derivatives_yaml_path: str, native_yaml_path: str, tags_yaml_path: str
+) -> DerivativeRet:
+    # Do some caching as this is a deterministic function
+    global _GLOBAL_LOAD_DERIVATIVE_CACHE
+    key = (derivatives_yaml_path, native_yaml_path)
+    if key not in _GLOBAL_LOAD_DERIVATIVE_CACHE:
+        with open(derivatives_yaml_path) as f:
+            definitions = yaml.load(f, Loader=YamlLoader)
+
+        funcs = parse_native_yaml(native_yaml_path, tags_yaml_path).native_functions
+        # From the parsed native functions, separate out the (generated) view_copy functions,
+        # so we can generate derivatives for them separately.
+        native_functions_with_view_groups = get_grouped_by_view_native_functions(funcs)
+        native_functions = concatMap(
+            lambda g: [g]
+            if isinstance(g, NativeFunction)
+            else list(g.functions(include_copy=True)),
+            native_functions_with_view_groups,
+        )
+        view_groups = [
+            g
+            for g in native_functions_with_view_groups
+            if isinstance(g, NativeFunctionsViewGroup)
+        ]
+
+        # What's the difference between function schema v.s. signature?
+        # function schema is the complete declaration including mutability annotation / default value and etc.
+        # signature is the canonical schema for a group of functions (in-place/out/functional variants)
+        # that are semantically related.
+        functions_by_signature: dict[FunctionSchema, list[NativeFunction]] = (
+            defaultdict(list)
+        )
+        functions_by_schema: dict[str, NativeFunction] = {}
+        for function in native_functions:
+            functions_by_signature[function.func.signature()].append(function)
+            assert str(function.func) not in functions_by_schema
+            functions_by_schema[str(function.func)] = function
+
+        # Keep track of how many of which ops we've seen so we can
+        # disambiguate them with a numeric suffix.
+        op_counter = Counter[str]()
+
+        # infos is a dict that maps FunctionSchema -> a dict of per dispatch key DifferentiabilityInfos
+        # this is useful because in tools/autograd/gen_autograd.py:match_differentiability_info
+        # we ultimately need to categorize the DifferentiabilityInfos by FunctionSchema
+        infos: dict[FunctionSchema, dict[str, DifferentiabilityInfo]] = {}
+        used_dispatch_keys: set[str] = set()
+        for defn_dict in definitions:
+            # Ensure that the old derivatives.yaml schema with no dispatch key can be loaded.
+            if "dispatch" not in defn_dict:
+                specification = defn_dict.pop("name")
+                output_differentiability = defn_dict.pop(
+                    "output_differentiability", None
+                )
+                defn_dict = {"name": specification, "dispatch": {"Default": defn_dict}}
+                if output_differentiability:
+                    defn_dict["output_differentiability"] = output_differentiability
+            name, per_dispatch_diffinfos = create_differentiability_info(
+                defn_dict,
+                functions_by_signature,
+                functions_by_schema,
+                op_counter,
+                used_dispatch_keys,
+            )
+            infos[name] = per_dispatch_diffinfos
+
+        add_view_copy_derivatives(infos, view_groups)
+
+        # cache both loaded infos as well a a set of all the dispatch_keys/aliases
+        # that appear in derivatives.yaml. used_dispatch_keys is useful for generating
+        # VariableType.cpp where we need a TORCH_LIBRARY_IMPL for every autograd dispatch key used
+        _GLOBAL_LOAD_DERIVATIVE_CACHE[key] = infos, used_dispatch_keys
+
+    return _GLOBAL_LOAD_DERIVATIVE_CACHE[key]
+
+
+# TODO: Why is this going through CppSignatureGroup, that doesn't make sense...
+@with_native_function
+def cpp_arguments(f: NativeFunction) -> Sequence[Binding]:
+    sigs = CppSignatureGroup.from_native_function(f, method=False)
+    if sigs.symint_signature is not None:
+        return sigs.symint_signature.arguments()
+    else:
+        return sigs.signature.arguments()
+
+
+def create_derivative(
+    f: NativeFunction,
+    formula: str,
+    var_names: tuple[str, ...],
+    available_named_gradients: Sequence[str],
+) -> Derivative:
+    original_formula = formula
+    arguments: list[NamedCType] = [
+        a.nctype.remove_const_ref() for a in cpp_arguments(f)
+    ]
+
+    return_names = tuple(n if n != "self" else "result" for n in cpp.return_names(f))
+    return_types = tuple(
+        cpp.return_type(r, symint=True).remove_const_ref() for r in f.func.returns
+    )
+
+    named_returns = [
+        NamedCType(name, type) for name, type in zip(return_names, return_types)
+    ]
+
+    formula, saved_inputs = saved_variables(formula, arguments, var_names)
+    formula, saved_outputs = saved_variables(formula, named_returns, var_names)
+
+    used_named_gradients = {
+        name
+        for name in available_named_gradients
+        if re.search(IDENT_REGEX.format(name), formula)
+    }
+
+    # Check that the referenced derivatives in the formula are in bounds
+    for i in used_gradient_indices(formula):
+        if i >= len(f.func.returns):
+            raise RuntimeError(
+                f"Out of bounds grads access: derivative formula for {cpp.name(f.func)} "
+                f"used grads[{i}], but the forward only returns {len(f.func.returns)} outputs."
+            )
+
+    return Derivative(
+        formula=formula,
+        original_formula=original_formula,
+        var_names=var_names,
+        saved_inputs=saved_inputs,
+        saved_outputs=saved_outputs,
+        named_gradients=used_named_gradients,
+    )
+
+
+def create_forward_derivative(
+    f: NativeFunction, formula: str, names: tuple[str, ...]
+) -> ForwardDerivative:
+    var_names = names
+    var_types: tuple[Type, ...] | None = None
+    for r in f.func.returns:
+        if r.name in var_names:
+            if var_types is None:
+                var_types = ()
+            var_types = var_types + (r.type,)
+
+    # Handle default return names
+    if var_types is None:
+        if var_names == ("result",):
+            assert len(f.func.returns) == 1
+            var_types = (f.func.returns[0].type,)
+        else:
+            for var_name in var_names:
+                res = re.findall(r"^result(\d+)$", var_name)
+                if len(res) == 1:
+                    if var_types is None:
+                        var_types = ()
+                    arg_idx = int(res[0])
+                    var_types = var_types + (f.func.returns[arg_idx].type,)
+
+    assert var_types is not None, "No matching output for forward derivative definition"
+    return ForwardDerivative(
+        formula=formula,
+        var_names=var_names,
+        var_types=var_types,
+        required_inputs_fw_grad=None,
+        required_inputs_primal=None,
+        required_original_self_value=False,
+        is_reusing_outplace_formula=False,
+    )
+
+
+def postprocess_forward_derivatives(
+    f: NativeFunction,
+    defn_name: str,
+    all_arg_names: list[str],
+    derivatives: list[Derivative],
+    forward_derivatives: list[ForwardDerivative],
+    args_with_derivatives: Sequence[Binding],
+) -> list[ForwardDerivative]:
+    def find_required_inputs(formula: str, postfix: str) -> tuple[str, ...]:
+        is_foreach = f.func.name.name.base.startswith("_foreach_")
+        required_inputs = set()
+        for arg in args_with_derivatives:
+            if (
+                arg.type in ("at::TensorList", "const at::ITensorListRef &")
+                and not is_foreach
+            ):
+                # The functions taking TensorList handle everything internally
+                continue
+            arg_name = arg.name
+
+            found = re.search(IDENT_REGEX.format(arg_name), formula)
+            if found:
+                raise RuntimeError(
+                    f"The forward formula for {defn_name} is using the base name of the {arg_name} "
+                    f"argument which is ambiguous. You should use {arg_name}_p to access the primal "
+                    f"value and {arg_name}_t to access the tangent."
+                )
+
+            found = re.search(IDENT_REGEX.format(arg_name + postfix), formula)
+            if found:
+                required_inputs.add(arg_name)
+
+        return tuple(required_inputs)
+
+    updated_derivatives: list[ForwardDerivative] = []
+
+    for defn in forward_derivatives:
+        formula = defn.formula
+        required_inputs_tangent = find_required_inputs(formula, "_t")
+        if formula == "auto_element_wise":
+            assert f.func.kind() != SchemaKind.inplace, (
+                f"Cannot use auto_element_wise with {f.func.name} because it is an in-place variant"
+            )
+            if (
+                (not len(args_with_derivatives) == 1)
+                or len(forward_derivatives) > 1
+                or len(forward_derivatives[0].var_names) > 1
+            ):
+                raise RuntimeError(
+                    f"Derivative definition of {defn_name} in derivatives.yaml defines the "
+                    "forward definition of gradient as element_wise but this only "
+                    "works for functions with a single differentiable input and a "
+                    "single differentiable output."
+                )
+            if not len(derivatives) == 1:
+                raise RuntimeError(
+                    f"Derivative definition of {defn_name} in derivatives.yaml defines the "
+                    "forward definition of gradient as element_wise but it does not "
+                    "defines the gradient formula for its argument which is required."
+                )
+            # This transformation is based on the observation that for element-wise functions, the Jacobian
+            # matrix is diagonal and thus doing J * v is the same as (v^T J)^T (in practice, we ignore the transpositions)
+            # For the complex case, we use hermitian transpose and get (v.conj() J).conj()
+            # So here we are going to reuse the backward formula and replace two things:
+            # 1) all occurrences of "grad" with "foo_t.conj()", where foo is the name of the unique differentiable input.
+            # 2) all usage of an original input "foo" with its primal value "foo_p".
+            # 3) conjugate the final result
+            # For example, for abs, the backward formula is:
+            #   grad * self.sgn()
+            # And this function generates a forward formula that is:
+            #   (self_t.conj() * self_p.sgn()).conj()
+
+            backward_formula = derivatives[0].original_formula
+            input_name = args_with_derivatives[0].name
+
+            # Do replacement 1) of the grad
+            def repl(m: Any) -> str:
+                return f"{m.group(1)}{input_name}_t.conj(){m.group(2)}"
+
+            fw_formula = re.sub(IDENT_REGEX.format("grad"), repl, backward_formula)
+
+            # Do replacement 2) of the input variables
+            for arg in args_with_derivatives:
+                arg_name = arg.name
+
+                def repl(m: Any) -> str:
+                    return f"{m.group(1)}{arg_name}_p{m.group(2)}"
+
+                fw_formula = re.sub(IDENT_REGEX.format(arg_name), repl, fw_formula)
+
+            # Do the final conjugate 3)
+            fw_formula = f"({fw_formula}).conj()"
+
+            # Since there is a single differentiable inputs and we necessarily need its tangent we can
+            # simply require all differentiable input's tangent.
+            required_inputs_tangent = tuple(all_arg_names)
+            formula = fw_formula
+        elif formula == "auto_linear":
+            if (
+                len(forward_derivatives) > 1
+                or len(forward_derivatives[0].var_names) > 1
+            ):
+                raise RuntimeError(
+                    f"Derivative definition of {defn_name} in derivatives.yaml defines the "
+                    "forward definition of gradient as linear but this only works "
+                    "for functions with a single differentiable output."
+                )
+            # This transformation is based on the observation that linear functions can be written as:
+            #   y = f(x) = A * x
+            # For some matrix A and the Jacobian of the function f is also A.
+            # So doing J * v = A * v = f(v).
+            # Hence to do the jvp, we simply need to evaluate the function at the point v instead of x.
+            # We do this by calling the forward again by replacing any occurrence of the differentiable
+            # input "foo" by it's tangent "foo_t".
+            # Note that multiple inputs are not a problem as long as the function is truly linear wrt to
+            # the vector where all the differentiable inputs are stacked.
+
+            diff_arg_names = [arg.name for arg in args_with_derivatives]
+            assert len(diff_arg_names) > 0
+
+            # Do replacement of input variables
+            new_args = []
+            for arg_name in all_arg_names:
+                if arg_name in diff_arg_names:
+                    arg_name = arg_name + "_t"
+                # pyrefly: ignore [bad-argument-type]
+                new_args.append(arg_name)
+
+            # TODO we are trolling
+            if f.func.has_symint():
+                defn_name += "_symint"
+
+            # Call into the forward again. We need two cases here to handle both Tensor methods and at:: functions.
+            if Variant.function in f.variants:
+                fw_formula = f"at::{defn_name}({', '.join(new_args)})"
+            else:
+                assert Variant.method in f.variants
+                fw_formula = f"{new_args[0]}.{defn_name}({', '.join(new_args[1:])})"
+
+            # All of the input tangents are always used so all of them are required here.
+            required_inputs_tangent = tuple(diff_arg_names)
+            formula = fw_formula
+
+        # At this point, the formula is final and is not modified anymore.
+
+        # During forward formula, we use the primal instead of the input Tensors.
+        # This call inspects the formula to find for which input's primal are used.
+        required_inputs_primal = find_required_inputs(formula, "_p")
+
+        updated_derivatives.append(
+            ForwardDerivative(
+                formula=formula,
+                var_names=defn.var_names,
+                var_types=defn.var_types,
+                required_inputs_fw_grad=required_inputs_tangent,
+                required_inputs_primal=required_inputs_primal,
+                required_original_self_value=False,
+                is_reusing_outplace_formula=False,
+            )
+        )
+
+    return updated_derivatives
+
+
+def is_forward_derivative_definition(
+    all_arg_names: list[str], names: tuple[str, ...]
+) -> bool:
+    for name in names:
+        return name not in all_arg_names
+    raise RuntimeError("Expected `names` to be non-empty")
+
+
+def create_differentiability_info(
+    defn_dict: dict[Any, Any],
+    functions_by_signature: dict[FunctionSchema, list[NativeFunction]],
+    functions_by_schema: dict[str, NativeFunction],
+    op_counter: Counter[str],
+    used_dispatch_keys: set[str],
+) -> tuple[FunctionSchema, dict[str, DifferentiabilityInfo]]:
+    """Processes a single entry `defn` in derivatives.yaml"""
+
+    def canonical_function(
+        functions: Sequence[NativeFunction], name: str
+    ) -> NativeFunction:
+        for f in functions:
+            if (
+                not f.func.is_functional_fn()
+                and not f.func.is_out_fn()
+                and name == str(f.func.name.name)
+            ):
+                return f
+        # some functions only have in-place variants
+        assert name + "_" == cpp.name(functions[0].func)
+        return functions[0]
+
+    def split_names(raw_names: str) -> tuple[str, ...]:
+        """Given "foo, bar", return ["foo", "bar"]."""
+        return tuple(x.strip() for x in raw_names.split(","))
+
+    def check_grad_usage(defn_name: str, derivatives: Sequence[Derivative]) -> None:
+        """
+        Check for some subtle mistakes one might make when writing derivatives.
+        These mistakes will compile, but will be latent until a function is
+        used with double backwards.
+        """
+
+        uses_grad = False  # true if any derivative uses "grad"
+        num_grads_uses = 0  # count of uses of "grads" or "grads[INDEX]"
+        uses_named_grads = False  # true if any derivative uses "grad_{name}"
+        used_grads_indices: list[int] = []  # which indices of grads are used
+        for d in derivatives:
+            formula = d.formula
+            uses_grad = uses_grad or bool(
+                re.findall(IDENT_REGEX.format("grad"), formula)
+            )
+            num_grads_uses += len(re.findall(IDENT_REGEX.format("grads"), formula))
+            uses_named_grads = uses_named_grads or bool(d.named_gradients)
+            used_grads_indices.extend(used_gradient_indices(formula))
+        # This is a basic sanity check: the number of places we see
+        # "grads" should be no fewer than the number of indices we see
+        # inside "grads". They may not be equal because we may use
+        # "grads" without an index.
+        assert num_grads_uses >= len(used_grads_indices)
+        # Thus if the number is equal, every use of grads is also
+        # indexed.
+        only_used_grads_indices = num_grads_uses == len(used_grads_indices)
+
+        if uses_grad and num_grads_uses > 0:
+            raise RuntimeError(
+                f"Derivative definition of {defn_name} in derivatives.yaml illegally "
+                "mixes use of 'grad' and 'grads'. Consider replacing "
+                "occurrences of 'grad' with 'grads[0]'"
+            )
+
+        if only_used_grads_indices and set(used_grads_indices) == {0}:
+            raise RuntimeError(
+                f"Derivative definition of {defn_name} in derivatives.yaml solely "
+                "refers to 'grads[0]'.  If the first output is indeed the "
+                "only differentiable output, replace 'grads[0]' with 'grad'; "
+                "otherwise, there is a likely error in your derivatives "
+                "declaration."
+            )
+
+        if uses_named_grads and (uses_grad or num_grads_uses > 0):
+            raise RuntimeError(
+                f"Derivative definition of {defn_name} in derivatives.yaml illegally "
+                'mixes use of "grad_RETURN_NAME" and "grad" or "grads[x]". Use '
+                "only one method for identifying gradients."
+            )
+
+    @with_native_function
+    def set_up_derivatives(
+        f: NativeFunction,
+    ) -> tuple[
+        Sequence[Derivative],
+        Sequence[ForwardDerivative],
+        Sequence[Binding],
+        Sequence[str],
+        Sequence[str],
+    ]:
+        # Set up the derivative information
+        derivatives: list[Derivative] = []
+        forward_derivatives: list[ForwardDerivative] = []
+        non_differentiable_arg_names: list[str] = []
+        args_with_derivatives_set: set[str] = set()
+
+        all_arg_names = [a.name for a in cpp_arguments(f)]
+        all_ret_names = [
+            r.name for r in f.func.returns
+        ]  # only used for the assert below
+        # output_differentiability is captured from the enclosed
+        # scope. Don't modify it.
+        #
+        # If it is not present, then no output is explicitly
+        # undifferentiable.
+        #
+        # It may be present and shorter than the length of return
+        # values. If that's the case, any return value that does not
+        # have a corresponding entry is considered not differentiable.
+        differentiability = output_differentiability or [True] * len(f.func.returns)
+        # A return is available as a named gradient ...
+        available_named_gradients = [
+            f"grad_{ret.name}"
+            for ret, differentiable in zip(f.func.returns, differentiability)
+            # if it has not been explicitly made undifferentiable
+            if differentiable
+            # and if it has a name
+            and ret.name is not None
+            # and if its type is differentiable
+            and ret.type.is_tensor_like()
+        ]
+
+        for raw_names in sorted(defn.keys()):
+            formula = defn[raw_names]
+            names = split_names(raw_names)
+
+            for name in names:
+                assert not (name in all_arg_names and name in all_ret_names), (
+                    f"While processing the derivative formula for '{f.func.name}' wrt '{name}', "
+                    f"expected '{name}' to not be both an input arg and named return. "
+                )
+
+            if is_forward_derivative_definition(all_arg_names, names):
+                forward_derivatives.append(create_forward_derivative(f, formula, names))
+            else:
+                if formula.lower().strip() == "non_differentiable":
+                    non_differentiable_arg_names += names
+                else:
+                    derivative = create_derivative(
+                        f, formula, names, available_named_gradients
+                    )
+                    derivatives.append(derivative)
+                    args_with_derivatives_set |= set(names)
+
+        overlap = args_with_derivatives_set.intersection(non_differentiable_arg_names)
+        if overlap:
+            raise RuntimeError(
+                f"derivatives definition for {defn} have overlapped non_differentiable "
+                f"and differentiable variables: {overlap}"
+            )
+
+        # Next, let us determine the list of inputs in order.
+        # TODO: do we need eagerly calculate and save it here? Can it be derived
+        # from NativeFunction and `derivatives` on callsites instead?
+        args_with_derivatives = [
+            a for a in cpp_arguments(f) if a.name in args_with_derivatives_set
+        ]
+
+        # Postprocess forward derivatives definitions now that we know the differentiable arguments
+        forward_derivatives = postprocess_forward_derivatives(
+            f,
+            defn_name,
+            all_arg_names,
+            derivatives,
+            forward_derivatives,
+            args_with_derivatives,
+        )
+
+        # Test to see if the use of 'grads' makes sense.
+        check_grad_usage(defn_name, derivatives)
+
+        return (
+            derivatives,
+            forward_derivatives,
+            args_with_derivatives,
+            non_differentiable_arg_names,
+            available_named_gradients,
+        )
+
+    # NB: Removes 'name' from defn dictionary
+    specification = defn_dict.pop("name")
+    defn_name, _ = split_name_params(specification)
+    # NB: Removes 'output_differentiability' from defn dictionary
+    #     `None` means all differentiable.
+    output_differentiability = defn_dict.pop("output_differentiability", None)
+    output_differentiability_conditions = None
+    if output_differentiability and any(
+        isinstance(diff, str) for diff in output_differentiability
+    ):
+        if len(output_differentiability) != 1:
+            raise RuntimeError(
+                f"Not supported: for {specification},"
+                f"output_differentiability must either be "
+                f"list[bool] or a list[str] where each str is a "
+                f"condition. In the case where it is a condition, "
+                f"we only support single-output functions. "
+                f"Please file us an issue. "
+            )
+        output_differentiability_conditions = output_differentiability
+        output_differentiability = [True]
+
+    schema_function = functions_by_schema.get(specification)
+    if not schema_function:
+        avail = "\n".join(
+            k for k, v in functions_by_schema.items() if cpp.name(v.func) == defn_name
+        )
+        raise RuntimeError(
+            f"could not find ATen function for schema: {specification} "
+            f".  Available signatures:\n{avail}"
+        )
+
+    # now map this to the legacy schema; this isn't technically necessary, but we'd need some logic here
+    # to map in-place schemas to the out-of-place variants.
+    # TODO: maybe the logic to handle the legacy schema is no longer necessary?
+    signature = schema_function.func.signature()
+    functions = functions_by_signature[signature]
+    if len(functions) == 0:
+        avail = "\n".join(
+            str(k)
+            for k, v in functions_by_signature.items()
+            if cpp.name(k) == defn_name
+        )
+        raise RuntimeError(
+            f"could not find ATen function for legacy signature: {signature} "
+            f"corresponding to schema {specification}.  Please report a bug to PyTorch. "
+            f"Available signatures:\n{avail}"
+        )
+
+    canonical = canonical_function(functions, defn_name)
+    if "grad_input_mask" in (a.name for a in cpp_arguments(canonical)):
+        raise RuntimeError(
+            f"Schema for {defn_name} has an argument named grad_input_mask, "
+            "but this name would be shadowed by our codegen. "
+            "Please use a different name in native_functions.yaml."
+        )
+
+    if "result" in (a.name for a in cpp_arguments(canonical)):
+        raise RuntimeError(
+            f"Schema for {defn_name} has an argument named result, "
+            "but this is only allowed for outputs."
+            "Please use a different name in native_functions.yaml."
+        )
+
+    diffinfo_dict = {}
+    for key, defn in defn_dict["dispatch"].items():
+        if key != "Default" and key not in _VALID_AUTOGRAD_KEYS:
+            raise RuntimeError(
+                f"Invalid dispatch key {key} in derivatives.yaml for {specification},"
+                f" expected key to be one of {_VALID_AUTOGRAD_KEYS}"
+            )
+        if key not in used_dispatch_keys:
+            used_dispatch_keys.add(key)
+
+        (
+            derivatives,
+            forward_derivatives,
+            args_with_derivatives,
+            non_differentiable_arg_names,
+            available_named_gradients,
+        ) = set_up_derivatives(canonical)
+
+        used_named_gradients: set[str] = set()
+        for d in derivatives:
+            used_named_gradients |= d.named_gradients
+
+        # only assign an op name if we are actually going to calculate a derivative
+        op = None
+        if args_with_derivatives:
+            op_prefix = _create_op_prefix(defn_name)
+            if key != "Default":
+                op_prefix = op_prefix + key
+            op = f"{op_prefix}{op_counter[op_prefix]}"
+            op_counter[op_prefix] += 1
+
+        diffinfo_dict[key] = DifferentiabilityInfo(
+            name=defn_name,
+            func=canonical,
+            op=op,
+            derivatives=derivatives,
+            forward_derivatives=forward_derivatives,
+            all_saved_inputs=dedup_vars(
+                [v for d in derivatives for v in d.saved_inputs]
+            ),
+            all_saved_outputs=dedup_vars(
+                [v for d in derivatives for v in d.saved_outputs]
+            ),
+            available_named_gradients=available_named_gradients,
+            used_named_gradients=used_named_gradients,
+            args_with_derivatives=args_with_derivatives,
+            non_differentiable_arg_names=non_differentiable_arg_names,
+            output_differentiability=output_differentiability,
+            output_differentiability_conditions=output_differentiability_conditions,
+        )
+
+    return canonical.func, diffinfo_dict
+
+
+GRAD_INDEX_REGEX = r"(?:^|\W)grads\[(\d+)\]"
+
+
+def used_gradient_indices(formula: str) -> list[int]:
+    """Determine a list of gradient indices (the i in grads[i]) that
+    are used by the formula.
+
+    >>> used_gradient_indices("foo(grads[0], grads[1])")
+    [0, 1]
+    """
+    return [int(i) for i in re.findall(GRAD_INDEX_REGEX, formula)]
+
+
+def saved_variables(
+    formula: str,
+    nctypes: list[NamedCType],
+    var_names: tuple[str, ...],
+) -> tuple[str, tuple[SavedAttribute, ...]]:
+    def stride_expr(name: str) -> str:
+        assert var_names == (name,), (
+            'Replacement for ".strides()" is currently only supported for single derivatives of the same tensor '
+            'that ".strides()" is being called on.'
+        )
+        return f'strides_or_error({name}, "{name}")'
+
+    REPLACEMENTS: list[tuple[str, dict[str, Any]]] = [
+        # replace self.sym_sizes() with self_sym_sizes
+        (
+            r"{}.sym_sizes\(\)",
+            {
+                "suffix": "_sym_sizes",
+                "nctype": lambda name: NamedCType(name, BaseCType(symIntArrayRefT)),
+            },
+        ),
+        # replace self->sym_sizes() with self_sym_sizes_opt
+        (
+            r"{}->sym_sizes\(\)",
+            {
+                "suffix": "_sym_sizes_opt",
+                "nctype": lambda name: NamedCType(
+                    name, OptionalCType(BaseCType(symIntArrayRefT))
+                ),
+                "expr": lambda name: f"{name}.has_value() ? std::optional<c10::SymIntArrayRef>({name}->sym_sizes()) : std::nullopt",
+            },
+        ),
+        # replace self.sym_blocksize() with self_sym_blocksize_opt
+        (
+            r"{}.sym_blocksize\(\)",
+            {
+                "suffix": "_self_sym_blocksize_opt",
+                "nctype": lambda name: NamedCType(
+                    name, OptionalCType(BaseCType(symIntArrayRefT))
+                ),
+                "expr": lambda name: f"at::sparse_csr::getSymIntBlockSize({name})",
+            },
+        ),
+        # replace self.options() with self_options
+        (
+            r"{}.options\(\)",
+            {
+                "suffix": "_options",
+                "nctype": lambda name: NamedCType(name, BaseCType(tensorOptionsT)),
+            },
+        ),
+        # replace zeros_like(self) with self_info
+        (
+            r"zeros_like\({}\)",
+            {
+                "suffix": "_info",
+                "nctype": lambda name: NamedCType(name, BaseCType(typeAndSizeT)),
+                "expr": lambda name: name,  # at save-time
+                "res": lambda name: name + "_info.zeros()",  # at eval-time
+            },
+        ),
+        # replace self.sym_size(2) with self_sym_size_2
+        (
+            r"{}.sym_size\((-?\w+)\)",
+            {
+                "suffix": lambda m: f"_sym_argsize_{m.groups()[0].replace('-', 'minus_')}",
+                "nctype": lambda name: NamedCType(name, BaseCType(SymIntT)),
+            },
+        ),
+        # replace self.numel() with self_numel
+        (
+            r"{}.numel\(\)",
+            {
+                "suffix": "_numel",
+                "nctype": lambda name: NamedCType(name, BaseCType(longT)),
+            },
+        ),
+        # replace self.sym_numel() with self_sym_numel
+        (
+            r"{}.sym_numel\(\)",
+            {
+                "suffix": "_sym_numel",
+                "nctype": lambda name: NamedCType(name, BaseCType(SymIntT)),
+            },
+        ),
+        # replace to_args_sizes(self) with self_args_sizes
+        (
+            r"to_args_sizes\({}\)",
+            {
+                "suffix": "_args_sizes",
+                "nctype": lambda name: NamedCType(
+                    name, VectorCType(VectorCType(BaseCType(longT)))
+                ),
+            },
+        ),
+        # replace to_args_sizes_symint(self) with self_args_sizes
+        (
+            r"to_args_sizes_symint\({}\)",
+            {
+                "suffix": "_args_sizes_symint",
+                "nctype": lambda name: NamedCType(
+                    name, VectorCType(VectorCType(BaseCType(SymIntT)))
+                ),
+            },
+        ),
+        # replace to_args_scalartypes(self) with self_args_scalartypes
+        (
+            r"to_args_scalartypes\({}\)",
+            {
+                "suffix": "_args_scalartypes",
+                "nctype": lambda name: NamedCType(
+                    name, VectorCType(BaseCType(scalarTypeT))
+                ),
+            },
+        ),
+        # replace TensorGeometry(self) with self_geometry
+        (
+            r"TensorGeometry\({}\)",
+            {
+                "suffix": "_geometry",
+                "nctype": lambda name: NamedCType(name, BaseCType(tensorGeometryT)),
+            },
+        ),
+        (
+            r"{}.scalar_type\(\)",
+            {
+                "suffix": "_scalar_type",
+                "nctype": lambda name: NamedCType(name, BaseCType(scalarTypeT)),
+            },
+        ),
+        # replace self.dim() with self_dim
+        (
+            r"{}.dim\(\)",
+            {
+                "suffix": "_dim",
+                "nctype": lambda name: NamedCType(name, BaseCType(longT)),
+            },
+        ),
+        # replace self.sym_strides() with self_sym_strides
+        (
+            r"{}.sym_strides\(\)",
+            {
+                "suffix": "_sym_strides",
+                "nctype": lambda name: NamedCType(name, BaseCType(symIntArrayRefT)),
+                "expr": stride_expr,
+            },
+        ),
+        # replace self.layout() with self_layout
+        (
+            r"{}.layout\(\)",
+            {
+                "suffix": "_layout",
+                "nctype": lambda name: NamedCType(name, BaseCType(layoutT)),
+            },
+        ),
+        # replace self.is_conj() with self_conjugate
+        (
+            r"{}.is_conj\(\)",
+            {
+                "suffix": "_conjugate",
+                "nctype": lambda name: NamedCType(name, BaseCType(boolT)),
+            },
+        ),
+    ]
+
+    # find which arguments need to be saved
+    saved: list[SavedAttribute] = []
+
+    if ".sizes()" in formula or "->sizes()" in formula:
+        raise RuntimeError(
+            ".sizes() is not supported in derivative formulas. Instead, please use the SymInt version,"
+            + f".sym_sizes(), which returned a c10::SymIntArrayRef. formula={formula}"
+        )
+    if re.search(r"\.size\([-]?\d+\)", formula) or re.search(
+        r"->size\([-]?\d+\)", formula
+    ):
+        raise RuntimeError(
+            ".size(int) is not supported in derivative formulas. Instead, please use the SymInt version,"
+            + f".sym_size(int), which returned a c10::SymIntArrayRef. formula={formula}"
+        )
+    if ".strides()" in formula or "->strides()" in formula:
+        raise RuntimeError(
+            ".strides() is not supported in derivative formulas. Instead, please use the SymInt version,"
+            + f".sym_strides(), which returned a c10::SymIntArrayRef. formula={formula}"
+        )
+    for nctype in nctypes:
+        # pyrefly: ignore [bad-assignment]
+        name = (
+            nctype.name.name if isinstance(nctype.name, SpecialArgName) else nctype.name
+        )
+        # First search the formula for expressions which can be evaluated
+        # when the autograd Function is created to avoid saving variables
+        for regex, info in REPLACEMENTS:
+
+            def repl(m: re.Match[str]) -> str:
+                suffix: str = (
+                    # pyrefly: ignore [bad-assignment]
+                    info["suffix"](m) if callable(info["suffix"]) else info["suffix"]
+                )
+                expr: str = info["expr"](name) if "expr" in info else m.group(0)
+                saved.append(
+                    SavedAttribute(
+                        nctype=info["nctype"](name + suffix),
+                        expr=expr,
+                    )
+                )
+                if "res" in info:
+                    replacement: str = info["res"](name)
+                    return replacement
+                return name + suffix
+
+            formula = re.sub(regex.format(name), repl, formula)
+
+        # std::optional<std::string> types stored in Backward nodes must be
+        # converted to std::optional<std::string_view> before being passed into
+        # the backward function
+        if nctype.type == OptionalCType(BaseCType(stringT)):
+            formula = re.sub(
+                rf"\b{name}\b",
+                f"{name}.has_value() ? std::optional<std::string_view>({name}.value()) : std::nullopt",
+                formula,
+            )
+
+        # Find any variables which remain in the formula and save them
+        if re.search(IDENT_REGEX.format(name), formula):
+            saved.append(
+                SavedAttribute(
+                    nctype=nctype,
+                    expr=name,
+                )
+            )
+
+    return formula, tuple(saved)
+
+
+def _create_op_prefix(name: str) -> str:
+    r"""Takes a native function name converts to an op prefix name.
+
+    Note that the "name" parameter must be the native function name
+    without the optional variant suffix, so "add" instead of
+    "add.out".
+
+    OP names correspond to classes, hence the change to title case.
+
+    Example::
+
+        >>> _create_op_prefix("add")
+        'AddBackward'
+    """
+    camel_case = "".join([p.title() for p in name.split("_")])
+    return (camel_case + "Backward").replace("ForwardBackward", "Backward")
+
+
+def dedup_vars(vars: Sequence[SavedAttribute]) -> Sequence[SavedAttribute]:
+    seen: set[str] = set()
+    saved: list[SavedAttribute] = []
+    for var in vars:
+        name = (
+            var.nctype.name.name
+            if isinstance(var.nctype.name, SpecialArgName)
+            else var.nctype.name
+        )
+        if name in seen:
+            continue
+        seen.add(name)
+        saved.append(var)
+    return saved
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/ADInplaceOrViewType.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/ADInplaceOrViewType.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..e8276697eee065a36d1b16e583a5f011f92541c2
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/ADInplaceOrViewType.cpp
@@ -0,0 +1,38 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+#include "torch/csrc/autograd/VariableTypeUtils.h"
+#include "torch/csrc/autograd/generated/ViewFuncs.h"
+
+#include <torch/library.h>
+#include <ATen/FunctionalInverses.h>
+#include <ATen/FunctionalTensorWrapper.h>
+
+// ${generated_comment}
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Operators.h>
+#else
+$ops_headers
+#endif
+
+using namespace at;
+using torch::autograd::CreationMeta;
+using torch::autograd::as_view;
+using torch::autograd::increment_version;
+
+namespace torch {
+
+namespace ADInplaceOrView {
+
+namespace {
+${inplace_or_view_method_definitions}
+}  // namespace
+}  // namespace ADInplaceOrView
+
+namespace {
+
+TORCH_LIBRARY_IMPL(aten, ADInplaceOrView, m) {
+  ${inplace_or_view_wrapper_registrations};
+}
+
+}  // namespace
+} // namespace torch
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/Functions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/Functions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..ba5cb3d912c5d7a3bbf31f4b0d38d4413dfc160c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/Functions.cpp
@@ -0,0 +1,44 @@
+#include "torch/csrc/autograd/FunctionsManual.h"
+#include "torch/csrc/dynamo/compiled_autograd.h"
+
+// ${generated_comment}
+
+// The manual function definitions that used to be here are now in torch/csrc/autograd/FunctionsManual.cpp
+// This speeds up re-compilation and allow to share these implementations so that they can be
+// used for forward mode AD formulas as well.
+
+using namespace torch::autograd::generated::details;
+using at::Tensor;
+using at::Scalar;
+using at::IntArrayRef;
+using at::TensorList;
+
+namespace torch::autograd::generated {
+
+static at::IValue compute_output_metadata(const torch::autograd::edge_list& next_edges) {
+  auto output_metadata = torch::dynamo::autograd::IValuePacker<
+      std::vector<std::optional<InputMetadata>>>::pack(
+              torch::dynamo::autograd::get_input_metadata(next_edges));
+  return output_metadata;
+}
+
+static C10_NOINLINE variable_list compiled_autograd_apply_functional(
+    const PackedArgs& packed_args,
+    const edge_list& next_edges,
+    SwapSavedVariables& saved,
+    const variable_list& grads,
+    const std::string& name) {
+  auto output_metadata = compute_output_metadata(next_edges);
+  const auto& pyinterface = torch::dynamo::autograd::getPyCompilerInterface();
+  return pyinterface->call_function(
+      saved.get_py_compiler(),
+      "apply_functional",
+      name,
+      grads,
+      packed_args.vec(),
+      output_metadata);
+}
+
+${autograd_function_definitions}
+
+} // namespace torch::autograd::generated
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/Functions.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/Functions.h
new file mode 100644
index 0000000000000000000000000000000000000000..911d7d905c002b29941167ccff112a8079d48266
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/Functions.h
@@ -0,0 +1,51 @@
+#pragma once
+
+// ${generated_comment}
+
+#include <ATen/ATen.h>
+#include <ATen/core/functional.h>
+#include <ATen/TensorGeometry.h>
+
+#include "torch/csrc/autograd/function.h"
+#include "torch/csrc/autograd/variable.h"
+#include "torch/csrc/autograd/saved_variable.h"
+#include <torch/csrc/Export.h>
+
+#include <c10/core/SymIntArrayRef.h>
+
+namespace torch { namespace autograd { namespace generated {
+
+using at::Scalar;
+using at::Tensor;
+using at::IntArrayRef;
+using at::ArrayRef;
+using at::Type;
+using at::TensorGeometry;
+using at::ScalarType;
+using std::optional;
+using c10::fmap;
+
+inline std::vector<Tensor> unpack_list(at::ArrayRef<SavedVariable> xs, std::shared_ptr<Node> saved_for = nullptr) {
+  // NB: we must explicitly do the conversion in the lambda, otherwise template
+  // deduction will give a Tensor of Variable which is not convertible
+  return fmap(xs, [&saved_for](const SavedVariable& x) {
+    // TODO(crcrpar): Use `std::move(saved_for)` to avoid incrementing refcount, which would need refactoring.
+    return static_cast<Tensor>(x.unpack(saved_for));
+  });
+}
+
+inline c10::List<std::optional<Tensor>> unpack_opt_list(at::ArrayRef<SavedVariable> xs, std::shared_ptr<Node> saved_for = nullptr) {
+  torch::List<std::optional<Tensor>> result;
+  result.reserve(xs.size());
+  for (const SavedVariable& v : xs) {
+    auto var = v.unpack(saved_for);
+    result.push_back(var.defined() ? std::optional<Tensor>(var) : ::std::nullopt);
+  }
+  return result;
+}
+
+using torch::autograd::TypeAndSize;
+
+${autograd_function_declarations}
+
+}}} // namespace torch::autograd::generated
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/TraceType.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/TraceType.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..fb5e7ae44a5353a3cc2a90858fe33b7fc0ef8bfd
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/TraceType.cpp
@@ -0,0 +1,40 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+#include "torch/csrc/jit/frontend/tracer.h"
+
+#include <torch/library.h>
+
+#include "torch/csrc/autograd/function.h"
+
+#include "ATen/quantized/Quantizer.h"
+
+// ${generated_comment}
+
+// See the `Tracer` section in `torch/csrc/jit/OVERVIEW.md`.
+// NOTE See [Sharded File] comment in VariableType
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Operators.h>
+#else
+$ops_headers
+#endif
+
+using namespace at;
+
+namespace torch {
+
+namespace TraceType {
+
+namespace {
+${trace_method_definitions}
+}  // namespace
+}  // namespace TraceType
+
+namespace {
+
+TORCH_LIBRARY_IMPL(aten, Tracer, m) {
+  ${trace_wrapper_registrations};
+}
+
+}  // namespace
+
+} // namespace torch
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/VariableType.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/VariableType.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..d1de108283b1169902a085e4886de7a0113c309c
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/VariableType.cpp
@@ -0,0 +1,77 @@
+#include "torch/csrc/autograd/VariableTypeUtils.h"
+#include "torch/csrc/autograd/generated/VariableType.h"
+#include "torch/csrc/autograd/FunctionsManual.h"
+
+#include <ATen/RedispatchFunctions.h>
+#include <c10/core/impl/TorchDispatchModeTLS.h>
+#include <ATen/core/TorchDispatchUtils.h>
+#include <torch/library.h>
+
+#include <ATen/SparseCsrTensorUtils.h>
+
+
+// ${generated_comment}
+
+// NOTE [Sharded File]: on this file's split-into-shards state
+//
+// Back in the good old days, VariableType.cpp was generated as one
+// file with every function in it, and everything was great and
+// simple.
+//
+// However, this file was also very large (over 36,000 lines), and
+// compiling it was very slow, and in fact was a significant
+// bottleneck for incremental rebuilds. To address this, we now
+// generate the file split across multiple shards, named
+// VariableType_0.cpp and so on, which can be compiled in parallel.
+//
+// For ease of inspection and debugging, so that it's not necessary to
+// go rooting around in multiple files, we also generate all the
+// functions together in VariableTypeEverything.cpp. This generated
+// file is only for convenience; it's not actually used in the
+// build. If the file you're looking at now is one of the shards, you
+// may want to switch over to the Everything variant to make you
+// grepping smoother.
+
+using namespace at;
+using namespace torch::autograd::generated;
+using namespace torch::autograd::generated::details;
+
+
+namespace torch::autograd {
+
+namespace VariableType {
+namespace{
+[[maybe_unused]] void reset_grad_accumulator(Variable& self) {
+  AutogradMeta* meta = torch::autograd::impl::get_autograd_meta(self);
+  if (meta != nullptr) {
+    meta->grad_accumulator_.reset();
+  }
+}
+[[maybe_unused]] size_t expected_fresh_use_count(const Variable& self) {
+  if (!self.defined()) {
+    // An UndefinedTensorImpl always has a use count of 0
+    return 0;
+  }
+  if (self.unsafeGetTensorImpl()->pyobj_slot()->load_pyobj() != nullptr) {
+    // A TensorImpl with a Python object has a use count of 2
+    return 2;
+  }
+  // A fresh TensorImpl (with no PyObject) has a use count of 1
+  return 1;
+}
+}
+
+namespace {
+
+
+${type_derived_method_definitions}
+}
+}
+
+namespace {
+
+${wrapper_registrations}
+
+}
+
+} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/VariableType.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/VariableType.h
new file mode 100644
index 0000000000000000000000000000000000000000..02959757e5c007a7d54526dc2ca18698748e95f1
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/VariableType.h
@@ -0,0 +1,55 @@
+#pragma once
+
+// ${generated_comment}
+
+#include <ATen/core/Tensor.h>
+#include <ATen/Context.h>
+
+#include <c10/util/intrusive_ptr.h>
+
+#include <torch/csrc/Export.h>
+#include <torch/csrc/autograd/autograd_not_implemented_fallback.h>
+
+#include <cstdint> // for size_t
+#include <functional> // for function
+#include <memory> // for unique_ptr
+#include <string>
+#include <vector>
+
+namespace at {
+  struct Quantizer;
+}
+
+namespace torch { namespace autograd {
+
+using Variable = at::Tensor;
+using at::Context;
+using at::Device;
+using at::Dimname;
+using at::DimnameList;
+using at::Generator;
+using at::IntArrayRef;
+using at::MemoryFormat;
+using at::QScheme;
+using at::Scalar;
+using at::ScalarType;
+using at::Storage;
+using at::Tensor;
+using at::TensorList;
+using at::TensorOptions;
+using at::Quantizer;
+using std::optional;
+
+namespace VariableType {
+  TORCH_API std::vector<at::DeprecatedTypeProperties*> allCUDATypes();
+  TORCH_API std::vector<at::DeprecatedTypeProperties*> allXPUTypes();
+  TORCH_API std::vector<at::DeprecatedTypeProperties*> allCPUTypes();
+  TORCH_API std::vector<at::DeprecatedTypeProperties*> allPrivateUser1Types();
+
+  at::Tensor & unpack(Tensor & t, const char * name, int pos);
+  const at::Tensor & unpack(const Tensor & t, const char * name, int pos);
+  at::Tensor unpack_opt(const Tensor & t, const char * name, int pos);
+  std::vector<at::Tensor> unpack(const at::ITensorListRef& tl, const char *name, int pos);
+}
+
+}} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/ViewFuncs.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/ViewFuncs.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..11b9b194fb46f924e863c4c1dab5cbb8dbb0601b
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/ViewFuncs.cpp
@@ -0,0 +1,14 @@
+#include <torch/csrc/autograd/generated/ViewFuncs.h>
+
+// ${generated_comment}
+
+using at::Tensor;
+using at::Scalar;
+using at::IntArrayRef;
+using at::TensorList;
+
+namespace torch::autograd::generated {
+
+${view_func_definitions}
+
+} // namespace torch::autograd::generated
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/ViewFuncs.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/ViewFuncs.h
new file mode 100644
index 0000000000000000000000000000000000000000..1f69c062d344e4cd5f98cf5f34fd4278019fdf8a
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/ViewFuncs.h
@@ -0,0 +1,28 @@
+#pragma once
+
+// ${generated_comment}
+
+#include <torch/library.h>
+#include <torch/csrc/autograd/variable.h>
+#include <c10/core/SymIntArrayRef.h>
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Operators.h>
+#else
+$ops_headers
+#endif
+
+namespace torch::autograd::generated {
+
+using at::Scalar;
+using at::Tensor;
+using at::IntArrayRef;
+using at::ArrayRef;
+using at::Type;
+using at::ScalarType;
+using std::optional;
+using c10::fmap;
+
+${view_func_declarations}
+
+} // namespace torch::autograd::generated
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/annotated_fn_args.py.in b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/annotated_fn_args.py.in
new file mode 100644
index 0000000000000000000000000000000000000000..1012c008451745b8f1ed1454a864f666caf2618a
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/annotated_fn_args.py.in
@@ -0,0 +1,11 @@
+"""
+This file is needed for generating procedural tests required for
+testing __torch_function__. See tests/test_overrides.py.
+"""
+
+# flake8: noqa
+import torch
+
+annotated_args = {
+${annotated_args}
+}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_enum_tag.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_enum_tag.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..83cfad1d7ba4d6fc3529caf78e036c5883e7bc23
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_enum_tag.cpp
@@ -0,0 +1,15 @@
+#include <torch/csrc/autograd/python_enum_tag.h>
+#include <torch/csrc/utils/pybind.h>
+#include <pybind11/pybind11.h>
+#include <ATen/core/enum_tag.h>
+
+namespace py = pybind11;
+namespace torch {
+    namespace autograd {
+    void initEnumTag(PyObject* module) {
+        auto m = py::handle(module).cast<py::module>();
+        py::enum_<at::Tag>(m, "Tag")
+        ${enum_of_valid_tags};
+        m.doc() = "An Enum that contains tags that can be assigned to an operator registered in C++.";
+    }
+}}
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_fft_functions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_fft_functions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..71ac4e2226d2db418eba5690995424d3f007e620
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_fft_functions.cpp
@@ -0,0 +1,81 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+#include "torch/csrc/Device.h"
+#include "torch/csrc/DynamicTypes.h"
+#include "torch/csrc/Exceptions.h"
+#include "torch/csrc/autograd/python_fft_functions.h"
+#include "torch/csrc/autograd/generated/python_return_types.h"
+#include "torch/csrc/autograd/python_variable.h"
+#include "torch/csrc/autograd/utils/wrap_outputs.h"
+#include "torch/csrc/autograd/utils/python_arg_parsing.h"
+#include "torch/csrc/autograd/generated/variable_factories.h"
+#include "torch/csrc/utils/out_types.h"
+#include "torch/csrc/utils/pycfunction_helpers.h"
+#include "torch/csrc/utils/python_arg_parser.h"
+#include "torch/csrc/utils/structseq.h"
+#include "torch/csrc/utils/device_lazy_init.h"
+
+#include <ATen/core/Tensor.h>
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Functions.h>
+#else
+$ops_headers
+#endif
+
+using at::Tensor;
+using at::Device;
+using at::Layout;
+using at::Scalar;
+using at::ScalarType;
+using at::Backend;
+using at::OptionalDeviceGuard;
+using at::DeviceGuard;
+using at::TensorOptions;
+using at::IntArrayRef;
+using at::Generator;
+using at::TensorList;
+using at::Dimname;
+using at::DimnameList;
+
+using torch::utils::check_out_type_matches;
+using namespace torch::autograd::utils;
+
+namespace torch::autograd {
+
+// generated forward declarations start here
+
+${py_forwards}
+
+static PyMethodDef fft_functions[] = {
+  ${py_method_defs}
+  {NULL}
+};
+
+static PyObject* THPFFTVariableFunctionsModule = NULL;
+
+void initFFTFunctions(PyObject* module) {
+  static struct PyModuleDef def = {
+     PyModuleDef_HEAD_INIT,
+     "torch._C._fft",
+     NULL,
+     -1,
+     fft_functions
+  };
+  PyObject* fft = PyModule_Create(&def);
+  THPFFTVariableFunctionsModule = fft;
+  if (!fft) {
+    throw python_error();
+  }
+  // steals a reference to fft
+  if (PyModule_AddObject(module, "_fft", fft) != 0) {
+    throw python_error();
+  }
+}
+
+// generated methods start here
+
+${py_methods}
+
+} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_functions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_functions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..1522d6cd0f5a2a1fc0188bf9d6d0d59fe1b27d85
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_functions.cpp
@@ -0,0 +1,37 @@
+#include <torch/csrc/autograd/generated/python_functions.h>
+
+// ${generated_comment}
+
+#include <Python.h>
+#include <ATen/ATen.h>
+
+#include <c10/core/SymNodeImpl.h>
+#include "torch/csrc/autograd/generated/Functions.h"
+#include "torch/csrc/autograd/python_cpp_function.h"
+#include <torch/csrc/autograd/python_variable.h>
+#include <torch/csrc/autograd/saved_variable.h>
+#include <torch/csrc/utils/pybind.h>
+#include <pybind11/pybind11.h>
+#include <torch/csrc/utils/pybind.h>
+
+// NOTE: See [Sharded File] comment in VariableType
+
+namespace torch::autograd::generated {
+
+template<typename C>
+static void addClass(PyObject* module, PyTypeObject& type, const char* name,
+  PyGetSetDef* function_properties=NULL, PyMethodDef* function_methods=NULL)
+{
+  _initFunctionPyTypeObject(type, name, function_properties, function_methods);
+  Py_INCREF(&type);
+  PyModule_AddObject(module, name, (PyObject*)&type);
+  registerCppFunction(typeid(C), &type);
+}
+
+${py_function_props_and_getters}
+
+void initialize_autogenerated_functions${shard_id}(PyObject* module) {
+  ${py_function_initializers}
+}
+
+} // namespace torch::autograd::generated
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_functions.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_functions.h
new file mode 100644
index 0000000000000000000000000000000000000000..22e37207e219431100fefaf21b02e3ed0f63d956
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_functions.h
@@ -0,0 +1,17 @@
+#pragma once
+
+#include <Python.h>
+
+// ${generated_comment}
+
+// Python bindings for automatically generated autograd functions
+
+namespace torch { namespace autograd { namespace generated {
+
+${shard_forward_declare}
+
+inline void initialize_autogenerated_functions(PyObject* module) {
+  ${shard_call}
+}
+
+}}} // namespace torch::autograd::generated
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_linalg_functions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_linalg_functions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..c93752a3ddbfcf111426f98c3ea68fc625e94def
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_linalg_functions.cpp
@@ -0,0 +1,68 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+#include "torch/csrc/Device.h"
+#include "torch/csrc/DynamicTypes.h"
+#include "torch/csrc/Exceptions.h"
+#include "torch/csrc/autograd/python_linalg_functions.h"
+#include "torch/csrc/autograd/generated/python_return_types.h"
+#include "torch/csrc/autograd/python_variable.h"
+#include "torch/csrc/autograd/utils/wrap_outputs.h"
+#include "torch/csrc/autograd/utils/python_arg_parsing.h"
+#include "torch/csrc/utils/pycfunction_helpers.h"
+#include "torch/csrc/utils/python_arg_parser.h"
+#include "torch/csrc/utils/structseq.h"
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Functions.h>
+#else
+$ops_headers
+#endif
+
+using at::Tensor;
+using at::Scalar;
+using at::ScalarType;
+using at::MemoryFormat;
+using at::Generator;
+using at::IntArrayRef;
+using at::TensorList;
+
+using namespace torch::autograd::utils;
+
+namespace torch::autograd {
+
+// generated forward declarations start here
+
+${py_forwards}
+
+static PyMethodDef linalg_functions[] = {
+  ${py_method_defs}
+  {NULL}
+};
+
+static PyObject* THPLinalgVariableFunctionsModule = NULL;
+
+void initLinalgFunctions(PyObject* module) {
+  static struct PyModuleDef def = {
+     PyModuleDef_HEAD_INIT,
+     "torch._C._linalg",
+     NULL,
+     -1,
+     linalg_functions
+  };
+  PyObject* linalg = PyModule_Create(&def);
+  THPLinalgVariableFunctionsModule = linalg;
+  if (!linalg) {
+    throw python_error();
+  }
+  // steals a reference to linalg
+  if (PyModule_AddObject(module, "_linalg", linalg) != 0) {
+    throw python_error();
+  }
+}
+
+// generated methods start here
+
+${py_methods}
+
+} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_nested_functions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_nested_functions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..3acb5128cee1e180de887080106e7cf5559f15ee
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_nested_functions.cpp
@@ -0,0 +1,81 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+#include "torch/csrc/Device.h"
+#include "torch/csrc/DynamicTypes.h"
+#include "torch/csrc/Exceptions.h"
+#include "torch/csrc/autograd/python_nested_functions.h"
+#include "torch/csrc/autograd/generated/python_return_types.h"
+#include "torch/csrc/autograd/python_variable.h"
+#include "torch/csrc/autograd/utils/wrap_outputs.h"
+#include "torch/csrc/autograd/utils/python_arg_parsing.h"
+#include "torch/csrc/autograd/generated/variable_factories.h"
+#include "torch/csrc/utils/out_types.h"
+#include "torch/csrc/utils/pycfunction_helpers.h"
+#include "torch/csrc/utils/python_arg_parser.h"
+#include "torch/csrc/utils/structseq.h"
+#include "torch/csrc/utils/device_lazy_init.h"
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Functions.h>
+#else
+$ops_headers
+#endif
+
+using at::Tensor;
+using at::Device;
+using at::Layout;
+using at::Scalar;
+using at::ScalarType;
+using at::Backend;
+using at::OptionalDeviceGuard;
+using at::DeviceGuard;
+using at::TensorOptions;
+using at::IntArrayRef;
+using at::OptionalIntArrayRef;
+using at::Generator;
+using at::TensorList;
+using at::Dimname;
+using at::DimnameList;
+
+using namespace torch::autograd::utils;
+
+namespace torch::autograd {
+
+// generated forward declarations start here
+
+${py_forwards}
+
+static PyMethodDef nested_functions[] = {
+  {NULL, NULL, 0, NULL},
+  ${py_method_defs}
+  {NULL}
+};
+
+static PyObject* THPNestedVariableFunctionsModule = NULL;
+
+void initNestedFunctions(PyObject* module) {
+  nested_functions[0] = get_nested_functions_manual()[0];
+  static struct PyModuleDef def = {
+     PyModuleDef_HEAD_INIT,
+     "torch._C._nested",
+     NULL,
+     -1,
+     nested_functions
+  };
+  PyObject* nested = PyModule_Create(&def);
+  THPNestedVariableFunctionsModule = nested;
+  if (!nested) {
+    throw python_error();
+  }
+  // steals a reference to nested
+  if (PyModule_AddObject(module, "_nested", nested) != 0) {
+    throw python_error();
+  }
+}
+
+// generated methods start here
+
+${py_methods}
+
+} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_nn_functions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_nn_functions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..8eabb0da2332283a02e98e54dd0a277a83a55ad6
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_nn_functions.cpp
@@ -0,0 +1,113 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+#include "torch/csrc/Device.h"
+#include "torch/csrc/DynamicTypes.h"
+#include "torch/csrc/Exceptions.h"
+#include "torch/csrc/autograd/python_nn_functions.h"
+#include "torch/csrc/autograd/generated/python_return_types.h"
+#include "torch/csrc/autograd/python_variable.h"
+#include "torch/csrc/autograd/utils/wrap_outputs.h"
+#include "torch/csrc/autograd/utils/python_arg_parsing.h"
+#include "torch/csrc/utils/pycfunction_helpers.h"
+#include "torch/csrc/utils/python_arg_parser.h"
+#include "torch/csrc/utils/structseq.h"
+#include "torch/csrc/utils/tensor_memoryformats.h"
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Functions.h>
+#else
+$ops_headers
+#endif
+
+using at::Tensor;
+using at::Scalar;
+using at::MemoryFormat;
+using at::Generator;
+using at::IntArrayRef;
+using at::ArrayRef;
+
+using namespace torch::autograd::utils;
+
+namespace torch::autograd {
+
+static PyObject* THPNNVariableFunctionsModule = nullptr;
+
+static PyObject * THPVariable__parse_to(PyObject* module, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "to(Device device=None, ScalarType dtype=None, bool non_blocking=False, bool copy=False, *, MemoryFormat? memory_format=None)",
+    "to(ScalarType dtype, bool non_blocking=False, bool copy=False, *, MemoryFormat? memory_format=None)",
+    "to(Tensor tensor, bool non_blocking=False, bool copy=False, *, MemoryFormat? memory_format=None)",
+  });
+  ParsedArgs<5> parsed_args;
+  auto r = parser.parse(args, kwargs, parsed_args);
+  if (r.has_torch_function()) {
+    return handle_torch_function(r, args, kwargs, THPNNVariableFunctionsModule, "torch.nn", "_parse_to");
+  }
+  auto parsed = parse_to_conversion(r, /*allow_copy*/ false); // we don't want copy for nn.Module.to
+  auto& device = std::get<0>(parsed);
+  auto& scalarType = std::get<1>(parsed);
+  auto non_blocking = std::get<2>(parsed);
+  auto opt_memory_format = std::get<4>(parsed);
+  auto tuple = THPObjectPtr{PyTuple_New(4)};
+  if (!tuple) throw python_error();
+  if (device) {
+    PyTuple_SET_ITEM(tuple.get(), 0, THPDevice_New(*device));
+  } else {
+    Py_INCREF(Py_None);
+    PyTuple_SET_ITEM(tuple.get(), 0, Py_None);
+  }
+  if (scalarType) {
+    PyTuple_SET_ITEM(tuple.get(), 1, Py_NewRef(torch::getTHPDtype(*scalarType)));
+  } else {
+    Py_INCREF(Py_None);
+    PyTuple_SET_ITEM(tuple.get(), 1, Py_None);
+  }
+  PyTuple_SET_ITEM(tuple.get(), 2, torch::autograd::utils::wrap(non_blocking));
+  if (opt_memory_format.has_value()) {
+    PyTuple_SET_ITEM(tuple.get(), 3, Py_NewRef(torch::utils::getTHPMemoryFormat(opt_memory_format.value())));
+  } else {
+    Py_INCREF(Py_None);
+    PyTuple_SET_ITEM(tuple.get(), 3, Py_None);
+  }
+  return tuple.release();
+  END_HANDLE_TH_ERRORS
+}
+
+// generated forward declarations start here
+
+${py_forwards}
+
+static PyMethodDef nn_functions[] = {
+  {"_parse_to", castPyCFunctionWithKeywords(THPVariable__parse_to),
+    METH_VARARGS | METH_KEYWORDS, nullptr},
+  ${py_method_defs}
+  {nullptr}
+};
+
+void initNNFunctions(PyObject* module) {
+  static struct PyModuleDef def = {
+     PyModuleDef_HEAD_INIT,
+     "torch._C._nn",
+     nullptr,
+     -1,
+     nn_functions
+  };
+  PyObject* nn = PyModule_Create(&def);
+  THPNNVariableFunctionsModule = nn;
+  if (!nn) {
+    throw python_error();
+  }
+  // steals a reference to nn
+  if (PyModule_AddObject(module, "_nn", nn) != 0) {
+    throw python_error();
+  }
+}
+
+// generated methods start here
+
+${py_methods}
+
+} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_return_types.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_return_types.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..139e6b8958336cfcc8328fa33581e9f1ab6d5532
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_return_types.cpp
@@ -0,0 +1,52 @@
+#include <Python.h>
+
+#include <vector>
+#include <map>
+#include <string>
+
+#include "torch/csrc/autograd/generated/python_return_types.h"
+#include "torch/csrc/utils/structseq.h"
+#include "torch/csrc/Exceptions.h"
+
+namespace torch { namespace autograd { namespace generated {
+
+${py_return_types}
+
+}}}
+
+namespace torch::autograd {
+
+static void addReturnType(
+    PyObject* module,
+    const char* name,
+    PyTypeObject* type) {
+  // hold onto the TypeObject for the unlikely case of user
+  // deleting or overriding it.
+  Py_INCREF(type);
+  if (PyModule_AddObject(
+          module,
+          name,
+          (PyObject*)type) != 0) {
+    Py_DECREF(type);
+    throw python_error();
+  }
+}
+
+void initReturnTypes(PyObject* module) {
+  static struct PyModuleDef def = {
+      PyModuleDef_HEAD_INIT, "torch._C._return_types", nullptr, -1, {}};
+  PyObject* return_types_module = PyModule_Create(&def);
+  if (!return_types_module) {
+    throw python_error();
+  }
+
+  ${py_return_types_registrations}
+
+  // steals a reference to return_types on success
+  if (PyModule_AddObject(module, "_return_types", return_types_module) != 0) {
+    Py_DECREF(return_types_module);
+    throw python_error();
+  }
+}
+
+} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_return_types.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_return_types.h
new file mode 100644
index 0000000000000000000000000000000000000000..ce6c355ea146a272709255b898603764112168b9
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_return_types.h
@@ -0,0 +1,14 @@
+#pragma once
+
+namespace torch {
+namespace autograd {
+namespace generated {
+
+${py_return_types_declarations}
+
+}
+
+void initReturnTypes(PyObject* module);
+
+} // namespace autograd
+} // namespace torch
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_sparse_functions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_sparse_functions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..648d91442102e9b950cb2ddb8db545c4b4e1100e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_sparse_functions.cpp
@@ -0,0 +1,67 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+#include "torch/csrc/Device.h"
+#include "torch/csrc/DynamicTypes.h"
+#include "torch/csrc/Exceptions.h"
+#include "torch/csrc/autograd/python_sparse_functions.h"
+#include "torch/csrc/autograd/python_variable.h"
+#include "torch/csrc/autograd/utils/wrap_outputs.h"
+#include "torch/csrc/autograd/utils/python_arg_parsing.h"
+#include "torch/csrc/utils/pycfunction_helpers.h"
+#include "torch/csrc/utils/python_arg_parser.h"
+#include "torch/csrc/utils/structseq.h"
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Functions.h>
+#else
+$ops_headers
+#endif
+
+using at::Tensor;
+using at::Scalar;
+using at::ScalarType;
+using at::MemoryFormat;
+using at::Generator;
+using at::IntArrayRef;
+using at::TensorList;
+
+using namespace torch::autograd::utils;
+
+namespace torch::autograd {
+
+// generated forward declarations start here
+
+${py_forwards}
+
+static PyMethodDef sparse_functions[] = {
+  ${py_method_defs}
+  {NULL}
+};
+
+static PyObject* THPSparseVariableFunctionsModule = NULL;
+
+void initSparseFunctions(PyObject* module) {
+  static struct PyModuleDef def = {
+     PyModuleDef_HEAD_INIT,
+     "torch._C._sparse",
+     NULL,
+     -1,
+     sparse_functions
+  };
+  PyObject* sparse = PyModule_Create(&def);
+  THPSparseVariableFunctionsModule = sparse;
+  if (!sparse) {
+    throw python_error();
+  }
+  // steals a reference to sparse
+  if (PyModule_AddObject(module, "_sparse", sparse) != 0) {
+    throw python_error();
+  }
+}
+
+// generated methods start here
+
+${py_methods}
+
+} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_special_functions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_special_functions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..bf9e109b4a77352cd85ba828b97d67d329543867
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_special_functions.cpp
@@ -0,0 +1,79 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+#include "torch/csrc/Device.h"
+#include "torch/csrc/DynamicTypes.h"
+#include "torch/csrc/Exceptions.h"
+#include "torch/csrc/autograd/python_special_functions.h"
+#include "torch/csrc/autograd/generated/python_return_types.h"
+#include "torch/csrc/autograd/python_variable.h"
+#include "torch/csrc/autograd/utils/wrap_outputs.h"
+#include "torch/csrc/autograd/utils/python_arg_parsing.h"
+#include "torch/csrc/autograd/generated/variable_factories.h"
+#include "torch/csrc/utils/out_types.h"
+#include "torch/csrc/utils/pycfunction_helpers.h"
+#include "torch/csrc/utils/python_arg_parser.h"
+#include "torch/csrc/utils/structseq.h"
+#include "torch/csrc/utils/device_lazy_init.h"
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Functions.h>
+#else
+$ops_headers
+#endif
+
+using at::Tensor;
+using at::Device;
+using at::Layout;
+using at::Scalar;
+using at::ScalarType;
+using at::Backend;
+using at::OptionalDeviceGuard;
+using at::DeviceGuard;
+using at::TensorOptions;
+using at::IntArrayRef;
+using at::Generator;
+using at::TensorList;
+using at::Dimname;
+using at::DimnameList;
+
+using torch::utils::check_out_type_matches;
+using namespace torch::autograd::utils;
+
+namespace torch::autograd {
+
+// generated forward declarations start here
+
+${py_forwards}
+
+static PyMethodDef special_functions[] = {
+  ${py_method_defs}
+  {NULL}
+};
+
+static PyObject* THPSpecialVariableFunctionsModule = NULL;
+
+void initSpecialFunctions(PyObject* module) {
+  static struct PyModuleDef def = {
+     PyModuleDef_HEAD_INIT,
+     "torch._C._special",
+     NULL,
+     -1,
+     special_functions
+  };
+  PyObject* special = PyModule_Create(&def);
+  THPSpecialVariableFunctionsModule = special;
+  if (!special) {
+    throw python_error();
+  }
+  // steals a reference to special
+  if (PyModule_AddObject(module, "_special", special) != 0) {
+    throw python_error();
+  }
+}
+
+// generated methods start here
+
+${py_methods}
+
+} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_torch_functions.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_torch_functions.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..c17d1040e1892b6a215a8c4264fe5a5345265bc7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_torch_functions.cpp
@@ -0,0 +1,93 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+// Python bindings for torch.* functions implemented through ATen.
+//
+// The functions are bound as static methods on a class
+// torch._C._VariableFunctions which is also aliased as Variable._torch
+// and also copied into 'torch' module.
+
+#include <Python.h>
+
+// Undefine the copysign macro so that at::copysign works as intended with MSVC
+// https://github.com/python/cpython/blob/c60394c7fc9cc09b16e9675a3eeb5844b6d8523f/PC/pyconfig.h#L196
+#ifdef _MSC_VER
+#undef copysign
+#endif // _MSC_VER
+
+#include "torch/csrc/autograd/python_torch_functions.h"
+#include "torch/csrc/autograd/python_variable.h"
+#include "torch/csrc/autograd/utils/wrap_outputs.h"
+#include "torch/csrc/Dtype.h"
+#include "torch/csrc/DynamicTypes.h"
+#include "torch/csrc/Exceptions.h"
+#include "torch/csrc/utils/out_types.h"
+#include "torch/csrc/utils/pybind.h"
+#include "torch/csrc/utils/pycfunction_helpers.h"
+#include "torch/csrc/utils/python_arg_parser.h"
+#include "torch/csrc/utils/tensor_layouts.h"
+#include "torch/csrc/utils/tensor_new.h"
+#include "torch/csrc/utils/tensor_numpy.h"
+#include "torch/csrc/jit/frontend/tracer.h"
+#include "torch/csrc/autograd/generated/variable_factories.h"
+#include "torch/csrc/utils/structseq.h"
+#include "torch/csrc/utils/device_lazy_init.h"
+#include "torch/csrc/autograd/generated/python_return_types.h"
+
+#include <ATen/core/Tensor.h>
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Functions.h>
+#else
+$ops_headers
+#endif
+
+#include <functional>
+#include <initializer_list>
+#include <stdexcept>
+#include <utility>
+
+using at::Tensor;
+using at::Device;
+using at::Layout;
+using at::Scalar;
+using at::ScalarType;
+using at::Backend;
+using at::OptionalDeviceGuard;
+using at::DeviceGuard;
+using at::TensorOptions;
+using at::IntArrayRef;
+using at::Generator;
+using at::TensorList;
+using at::Dimname;
+using at::DimnameList;
+using at::ArrayRef;
+
+using torch::utils::check_out_type_matches;
+using namespace torch::autograd::utils;
+
+// NOTE: See [Sharded File] comment in VariableType
+
+namespace torch::autograd {
+
+// generated forward declarations start here
+
+${py_forwards}
+
+static PyMethodDef torch_functions_shard[] = {
+  ${py_method_defs}
+};
+
+void gatherTorchFunctions${shard_id}(std::vector<PyMethodDef> &torch_functions) {
+  constexpr size_t num_functions = sizeof(torch_functions_shard) / sizeof(torch_functions_shard[0]);
+  torch_functions.insert(
+    torch_functions.end(),
+    torch_functions_shard,
+    torch_functions_shard + num_functions);
+}
+
+// generated methods start here
+
+${py_methods}
+
+} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_variable_methods.cpp b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_variable_methods.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..bfc5b80835c4b203d96ea3a1952ae2fba897edf3
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/python_variable_methods.cpp
@@ -0,0 +1,1338 @@
+#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
+// ${generated_comment}
+
+#include <Python.h>
+
+// Undefine the copysign macro so that at::copysign works as intended with MSVC
+// https://github.com/python/cpython/blob/c60394c7fc9cc09b16e9675a3eeb5844b6d8523f/PC/pyconfig.h#L196
+#ifdef _MSC_VER
+#undef copysign
+#endif // _MSC_VER
+
+#include "torch/csrc/DynamicTypes.h"
+#include "torch/csrc/Exceptions.h"
+#include "torch/csrc/Size.h"
+#include "torch/csrc/autograd/generated/VariableType.h"
+#include "torch/csrc/autograd/python_variable.h"
+#include "torch/csrc/autograd/utils/python_arg_parsing.h"
+#include "torch/csrc/autograd/utils/error_messages.h"
+#include "torch/csrc/autograd/utils/wrap_outputs.h"
+#include "torch/csrc/jit/frontend/tracer.h"
+#ifdef USE_CUDA
+#include "torch/csrc/cuda/Event.h"
+#endif
+#include "torch/csrc/utils/device_lazy_init.h"
+#include <torch/csrc/utils/numpy_stub.h>
+#include "torch/csrc/utils/object_ptr.h"
+#include "torch/csrc/utils/pycfunction_helpers.h"
+#include "torch/csrc/utils/python_arg_parser.h"
+#include "torch/csrc/utils/python_numbers.h"
+#include "torch/csrc/utils/python_strings.h"
+#include "torch/csrc/utils/tensor_apply.h"
+#include "torch/csrc/utils/tensor_list.h"
+#include "torch/csrc/utils/tensor_new.h"
+#include "torch/csrc/utils/tensor_numpy.h"
+#include "torch/csrc/utils/tensor_types.h"
+#include "torch/csrc/autograd/generated/python_return_types.h"
+
+#include <ATen/core/Tensor.h>
+#include <ATen/core/grad_mode.h>
+#include <ATen/FuncTorchTLS.h>
+#include "c10/core/Stream.h"
+
+#include <optional>
+#include <stdexcept>
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Functions.h>
+#else
+$ops_headers
+#include <ATen/ops/_local_scalar_dense.h>
+#endif
+
+using at::device_of;
+using at::OptionalDeviceGuard;
+using at::Scalar;
+using at::ScalarType;
+using at::Tensor;
+using c10::Stream;
+using namespace torch::autograd::utils;
+
+namespace torch::autograd {
+
+static PyObject * THPVariable__is_view(PyObject *self, PyObject* args)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "_is_view", args);
+  }
+  auto& self_ = THPVariable_Unpack(self);
+  if (self_.is_view()) {
+    Py_RETURN_TRUE;
+  } else {
+    Py_RETURN_FALSE;
+  }
+  END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object bc no support for first-class functions in native_functions.yaml
+// See: ATen/native/README.md for more context
+static PyObject * THPVariable_apply_(PyObject* self, PyObject* arg)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    auto args = py::make_tuple(py::handle(arg));
+    return handle_torch_function(self, "apply_", args.ptr());
+  }
+  auto& self_ = THPVariable_Unpack(self);
+  if (self_.requires_grad()) {
+    throw std::runtime_error(
+        "Can't call apply_() on Variable that requires grad. Use "
+        "var.detach().apply_() instead.");
+  }
+  return THPVariable_Wrap(torch::utils::apply_(self_, arg));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_size(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "size(int64_t? dim=None)",
+    "size(Dimname dim)",
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<3> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+  if (r.idx == 0) {
+    if (!r.toInt64Optional(0).has_value()) {
+      return THPSize_NewFromSymSizes(self_);
+    }
+    if (jit::tracer::isTracing()) {
+      // will error out if a tensor has symints
+      return wrap(jit::tracer::getSizeOf(self_, r.toInt64(0)));
+    } else {
+      return torch::toPyObject(self_.sym_size(r.toInt64(0)));
+    }
+  } else if (r.idx == 1) {
+    if (jit::tracer::isTracing()) {
+      TORCH_INTERNAL_ASSERT(false, "NYI: Named tensors w/ JIT");
+    }
+    return wrap(self_.size(r.dimname(0)));
+  }
+  Py_RETURN_NONE;
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_stride(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "stride(int64_t? dim=None)",
+    "stride(Dimname dim)",
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<3> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  if (r.idx == 0) {
+    if (r.toInt64Optional(0).has_value()) {
+      return torch::toPyObject(self_.sym_stride(r.toInt64(0)));
+    }
+    // yes, this is called strides in ATen.
+    at::SymIntArrayRef strides = self_.sym_strides();
+    // we can't do the normal wrapping here because IntArrayRef maps to both
+    // torch.Size and tuple in python
+    // TODO: consider factoring this out
+    THPObjectPtr tuple(PyTuple_New(static_cast<Py_ssize_t>(strides.size())));
+    if (!tuple) throw python_error();
+    for (size_t i = 0; i != strides.size(); i++) {
+      PyObject* s = torch::toPyObject(strides[i]);
+      if (!s) throw python_error();
+      PyTuple_SET_ITEM(tuple.get(), i, s);
+    }
+    return tuple.release();
+  } else if (r.idx == 1) {
+    return wrap(self_.stride(r.dimname(0)));
+  }
+  Py_RETURN_NONE;
+  END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object to avoid dispatch overhead
+static PyObject * THPVariable_get_device(PyObject* self_, PyObject* args)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self_)) {
+    return handle_torch_function(self_, "get_device", args, nullptr);
+  }
+  auto& self = THPVariable_Unpack(self_);
+  return wrap(self.get_device());
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_has_names(PyObject* self_, PyObject* args)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self_)) {
+    return handle_torch_function(self_, "has_names", args);
+  }
+  auto& self = THPVariable_Unpack(self_);
+  return wrap(self.has_names());
+  END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object to avoid dispatch overhead
+static PyObject * THPVariable_data_ptr(PyObject* self_, PyObject* args)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self_)) {
+    return handle_torch_function(self_, "data_ptr", args);
+  }
+  auto& self = THPVariable_Unpack(self_);
+  return wrap(self.data_ptr());
+  END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object to avoid dispatch overhead
+static PyObject * THPVariable_storage_offset(PyObject* self_, PyObject* args)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self_)) {
+    return handle_torch_function(self_, "storage_offset");
+  }
+  auto& self = THPVariable_Unpack(self_);
+  return py::cast(self.sym_storage_offset()).release().ptr();
+  END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object to avoid dispatch overhead
+static PyObject * THPVariable_dim(PyObject* self, PyObject* args)
+{
+   HANDLE_TH_ERRORS
+   if (check_has_torch_function(self)) {
+     return handle_torch_function(self, "dim", args);
+   }
+   auto& self_ = THPVariable_Unpack(self);
+   return THPUtils_packInt64(self_.dim());
+   END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object to avoid dispatch overhead
+static PyObject * THPVariable_numel(PyObject* self, PyObject* args)
+{
+   HANDLE_TH_ERRORS
+   if (check_has_torch_function(self)) {
+     return handle_torch_function(self, "numel", args);
+   }
+   auto& self_ = THPVariable_Unpack(self);
+   if (jit::tracer::isTracing()) {
+     return wrap(jit::tracer::getNumelOf(self_));
+   } else {
+     return py::cast(self_.sym_numel()).release().ptr();
+   }
+   END_HANDLE_TH_ERRORS
+}
+
+static Tensor dispatch_contiguous(const Tensor & self, at::MemoryFormat memory_format) {
+  pybind11::gil_scoped_release no_gil;
+  OptionalDeviceGuard device_guard(device_of(self));
+  return self.contiguous(memory_format);
+}
+
+static PyObject * THPVariable_contiguous(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "contiguous(*, MemoryFormat memory_format=contiguous_format)",
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto& self_ = THPVariable_Unpack(self);
+  auto memory_format = r.memoryformat(0);
+  // avoids touching the GIL or current device if self is already contiguous
+  if (self_.is_contiguous_or_false(memory_format)) {
+    // NOTE: this logic is duplicated from VariableType.cpp. Since we need to
+    // record this call to contiguous() in the trace regardless of whether
+    // we actually call contiguous here, we need to record this information
+    // manually.
+    if (jit::tracer::isTracing()) {
+      const auto& tracer_state = jit::tracer::getTracingState();
+      auto op_name = c10::Symbol::fromQualString("aten::contiguous");
+      auto node = tracer_state->createNode(op_name, /*num_outputs=*/0);
+      jit::tracer::recordSourceLocation(node);
+      jit::tracer::addInputs(node, "self", self_);
+      jit::tracer::addInputs(node, "memory_format", memory_format);
+      tracer_state->insertNode(node);
+      jit::tracer::addOutput(node, self_);
+    }
+    Py_INCREF(self);
+    return self;
+  }
+  return THPVariable_Wrap(dispatch_contiguous(self_, memory_format));
+  END_HANDLE_TH_ERRORS
+}
+
+static Tensor dispatch_copy_(const Tensor & self, const Tensor & other, bool non_blocking) {
+  pybind11::gil_scoped_release no_gil;
+  OptionalDeviceGuard device_guard(device_of(self));
+  return self.copy_(other, non_blocking);
+}
+
+static void maybe_warn_requires_grad(const Tensor & self) {
+  if (at::GradMode::is_enabled() && self.requires_grad()) {
+    TORCH_WARN_ONCE("Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.\n"
+                    "Consider using tensor.detach() first.");
+  }
+}
+
+ static PyObject * THPVariable_copy_(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "copy_(Tensor other, bool non_blocking=False)",
+    "copy_(Tensor other, bool async=False)|deprecated"
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<2> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  return THPVariable_Wrap(dispatch_copy_(self_, r.tensor(0), r.toBool(1)));
+  END_HANDLE_TH_ERRORS
+}
+
+template<typename T>
+static T dispatch_to(const Tensor & self) {
+  pybind11::gil_scoped_release no_gil;
+  OptionalDeviceGuard device_guard(device_of(self));
+  TORCH_CHECK_VALUE(self.sym_numel() == 1, "only one element tensors can be converted to Python scalars");
+  return self.template item<T>();
+}
+
+static PyObject * THPVariable_float_scalar(PyObject* self, PyObject* args) {
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "__float__", args);
+  }
+  jit::tracer::warn("Converting a tensor to a Python float", jit::tracer::WARN_PYTHON_DATAFLOW);
+  auto& self_ = THPVariable_Unpack(self);
+  maybe_warn_requires_grad(self_);
+  return wrap(dispatch_to<double>(self_));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_complex_scalar(PyObject* self, PyObject* args) {
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "__complex__", args);
+  }
+  jit::tracer::warn("Converting a tensor to a Python complex", jit::tracer::WARN_PYTHON_DATAFLOW);
+  auto& self_ = THPVariable_Unpack(self);
+  maybe_warn_requires_grad(self_);
+  return wrap(dispatch_to<c10::complex<double>>(self_));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_integral_scalar(PyObject* self, PyObject* args) {
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "__int__", args);
+  }
+  jit::tracer::warn("Converting a tensor to a Python integer", jit::tracer::WARN_PYTHON_DATAFLOW);
+  auto& self_ = THPVariable_Unpack(self);
+  if (isFloatingType(self_.scalar_type())) {
+    // we can't dispatch to item<int64_t> here because we want to avoid ATen overflow checks;
+    // the python integral type (long in python2) can't overflow.
+    return THPUtils_packDoubleAsInt(dispatch_to<double>(self_));
+  } else {
+    return wrap(dispatch_to<int64_t>(self_));
+  }
+  END_HANDLE_TH_ERRORS
+}
+
+// This is the __index__ function in Python which is similar to __int__, but
+// called when used as a slice.
+static PyObject * THPVariable_index_scalar(PyObject* self, PyObject* args) {
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "__index__", args);
+  }
+  auto& self_ = THPVariable_Unpack(self);
+  // TODO: change the condition to `self_.dim() != 0` once we expose scalars
+  // in PyTorch.
+  if (!isIntegralType(self_.scalar_type(), /*includeBool=*/true) || self_.sym_numel() != 1) {
+    throw TypeError("only integer tensors of a single element can be converted to an index");
+  }
+  return wrap(dispatch_to<int64_t>(self_));
+  END_HANDLE_TH_ERRORS
+}
+
+static Tensor dispatch_invert(const Tensor & self) {
+  pybind11::gil_scoped_release no_gil;
+  OptionalDeviceGuard device_guard(device_of(self));
+  return self.bitwise_not();
+}
+
+static PyObject * THPVariable_invert(PyObject* self, PyObject* args) {
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "__invert__", args);
+  }
+  auto& self_ = THPVariable_Unpack(self);
+  if (!isIntegralType(self_.scalar_type(), /*includeBool=*/true)) {
+    throw TypeError("~ (operator.invert) is only implemented on integer and Boolean-type tensors");
+  }
+  return THPVariable_Wrap(dispatch_invert(self_));
+  END_HANDLE_TH_ERRORS
+}
+
+static Tensor dispatch_to(const Tensor & self, Device device, bool non_blocking, bool copy, std::optional<c10::MemoryFormat> optional_memory_format) {
+  pybind11::gil_scoped_release no_gil;
+  // NOTE: this is where we record aten::to in the graph during tracing. However, the behavior of aten::to
+  // is different with respect to TensorOptions fields that are not present: aten::to inherits fields that
+  // are missing from the self argument while the tracer assumes that they should be populated with the
+  // default values (eg. float for scalar type). By explicitly copying over the tensor options here we fully
+  // specify all tensor options and thus record the proper trace
+  return self.to(self.options().device(device).memory_format(optional_memory_format), non_blocking, copy);
+}
+
+static Tensor dispatch_to(const Tensor & self, bool non_blocking, bool copy, std::optional<c10::MemoryFormat> optional_memory_format) {
+  pybind11::gil_scoped_release no_gil;
+  return self.to(self.options().memory_format(optional_memory_format), non_blocking, copy);
+}
+
+static Tensor dispatch_to(const Tensor & self, ScalarType dtype, bool non_blocking, bool copy, std::optional<c10::MemoryFormat> optional_memory_format) {
+  pybind11::gil_scoped_release no_gil;
+  // TODO: Make this call the TensorOptions version, maybe?
+  return self.to(dtype, non_blocking, copy, optional_memory_format);
+}
+
+static Tensor dispatch_to(const Tensor & self, Device device, ScalarType dtype, bool non_blocking, bool copy, std::optional<c10::MemoryFormat> optional_memory_format) {
+  pybind11::gil_scoped_release no_gil;
+  // TODO: Make this call the TensorOptions version, maybe?
+  return self.to(device, dtype, non_blocking, copy, optional_memory_format);
+}
+
+static PyObject * THPVariable_cpu(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+   HANDLE_TH_ERRORS
+   static PythonArgParser parser({
+     "cpu(*, MemoryFormat? memory_format=None)"
+   });
+   auto& self_ = THPVariable_Unpack(self);
+   ParsedArgs<1> parsed_args;
+   auto r = parser.parse(self, args, kwargs, parsed_args);
+
+   if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+    }
+
+   auto opt_memory_format = r.memoryformatOptional(0);
+   return THPVariable_Wrap(dispatch_to(self_, at::Device(at::DeviceType::CPU), false, false, opt_memory_format));
+   END_HANDLE_TH_ERRORS
+}
+
+static Tensor dispatch_nonzero(const Tensor & self) {
+  pybind11::gil_scoped_release no_gil;
+  OptionalDeviceGuard device_guard(device_of(self));
+  return self.nonzero();
+}
+
+static std::vector<Tensor> dispatch_nonzero_numpy(const Tensor & self) {
+  pybind11::gil_scoped_release no_gil;
+  OptionalDeviceGuard device_guard(device_of(self));
+  return self.nonzero_numpy();
+}
+
+static PyObject * THPVariable_nonzero(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "nonzero()",
+    "nonzero(*, bool as_tuple)",
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<2> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  if (r.idx == 0 || (r.idx == 1 && !r.toBool(0))) {
+    return wrap(dispatch_nonzero(self_));
+  } else {
+    return wrap(dispatch_nonzero_numpy(self_));
+  }
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_cuda(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "cuda(Device? device=None, bool non_blocking=False, *, MemoryFormat? memory_format=None)",
+    "cuda(Device? device=None, bool async=False, *, MemoryFormat? memory_format=None)|deprecated"
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<3> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto device = r.isNone(0) ? at::Device(at::DeviceType::CUDA) : r.device(0);
+  auto opt_memory_format = r.memoryformatOptional(2);
+  TORCH_CHECK(device.is_cuda(), "Invalid device, must be cuda device");
+  torch::utils::device_lazy_init(at::kCUDA);
+  return THPVariable_Wrap(dispatch_to(self_, device, r.toBool(1), false, opt_memory_format));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_mtia(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "mtia(Device? device=None, bool non_blocking=False, *, MemoryFormat? memory_format=None)",
+    "mtia(Device? device=None, bool async=False, *, MemoryFormat? memory_format=None)|deprecated"
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<3> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if (r.has_torch_function()) {
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto device = r.isNone(0) ? at::Device(at::DeviceType::MTIA) : r.device(0);
+  auto opt_memory_format = r.memoryformatOptional(2);
+  TORCH_CHECK(device.is_mtia(), "Invalid device, must be MTIA device");
+  torch::utils::device_lazy_init(at::kMTIA);
+  return THPVariable_Wrap(dispatch_to(self_, device, r.toBool(1), false, opt_memory_format));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_xpu(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "xpu(Device? device=None, bool non_blocking=False, *, MemoryFormat? memory_format=None)",
+    "xpu(Device? device=None, bool async=False, *, MemoryFormat? memory_format=None)|deprecated"
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<3> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if (r.has_torch_function()) {
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto device = r.isNone(0) ? at::Device(at::DeviceType::XPU) : r.device(0);
+  auto opt_memory_format = r.memoryformatOptional(2);
+  TORCH_CHECK(device.is_xpu(), "Invalid device, must be xpu device");
+  torch::utils::device_lazy_init(at::kXPU);
+  return THPVariable_Wrap(dispatch_to(self_, device, r.toBool(1), false, opt_memory_format));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_ipu(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "ipu(Device? device=None, bool non_blocking=False, *, MemoryFormat? memory_format=None)",
+    "ipu(Device? device=None, bool async=False, *, MemoryFormat? memory_format=None)|deprecated"
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<3> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if (r.has_torch_function()) {
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto device = r.isNone(0) ? at::Device(at::DeviceType::IPU) : r.device(0);
+  auto opt_memory_format = r.memoryformatOptional(2);
+  TORCH_CHECK(device.is_ipu(), "Invalid device, must be ipu device");
+  return THPVariable_Wrap(dispatch_to(self_, device, r.toBool(1), false, opt_memory_format));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_to_type(PyObject* self, ScalarType scalarType, std::optional<c10::MemoryFormat> optional_memory_format) {
+  HANDLE_TH_ERRORS
+  auto& self_ = THPVariable_Unpack(self);
+  return THPVariable_Wrap(dispatch_to(self_, scalarType, false, false, optional_memory_format));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_byte(PyObject* self, PyObject* args, PyObject* kwargs)  {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "byte(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::Byte, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_char(PyObject* self, PyObject* args, PyObject* kwargs)  {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "char(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::Char, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_double(PyObject* self, PyObject* args, PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "double(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::Double, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_float(PyObject* self, PyObject* args, PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "float(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::Float, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_cdouble(PyObject* self, PyObject* args, PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "cdouble(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::ComplexDouble, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_cfloat(PyObject* self, PyObject* args, PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "cfloat(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::ComplexFloat, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_half(PyObject* self, PyObject* args, PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "half(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::Half, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_int(PyObject* self, PyObject* args, PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "int(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::Int, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_long(PyObject* self, PyObject* args, PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "long(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::Long, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_short(PyObject* self, PyObject* args, PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "short(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::Short, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_bool(PyObject* self, PyObject* args, PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "bool(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::Bool, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_bfloat16(PyObject* self, PyObject* args, PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "bfloat16(*, MemoryFormat? memory_format=None)"
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  auto opt_memory_format = r.memoryformatOptional(0);
+  return THPVariable_to_type(self, ScalarType::BFloat16, opt_memory_format);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_element_size(PyObject* self, PyObject* args)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "element_size", args);
+  }
+  auto& self_ = THPVariable_Unpack(self);
+  return THPUtils_packInt64(self_.element_size());
+  END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object bc PyObjects not declarable in native_functions.yaml
+// See: ATen/native/README.md for more context
+static PyObject * THPVariable_numpy(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "numpy(*, bool force=False)"
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if (r.has_torch_function()) {
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  jit::tracer::warn("Converting a tensor to a NumPy array", jit::tracer::WARN_PYTHON_DATAFLOW);
+  return torch::utils::tensor_to_numpy(self_, r.toBool(0));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_requires_grad_(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "requires_grad_(bool requires_grad=True)",
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  // temporary hack to improve functorch UX.
+  const auto& functorch_tls = at::functorch::functorchTLSAccessor();
+  if (functorch_tls) {
+    functorch_tls->checkSupportsInplaceRequiresGrad();
+  }
+
+  auto requires_grad = r.toBool(0);
+  // should we throw if requires_grad is true?  var.requires_grad = True throws here
+  // but it's nice to let this be a no-op.
+  if (!self_.is_leaf() && !requires_grad) {
+    throw std::runtime_error(autograd::utils::requires_grad_leaf_error(requires_grad));
+  }
+  if (requires_grad && ! isDifferentiableType(at::typeMetaToScalarType(self_.dtype()))) {
+    throw std::runtime_error("only Tensors of floating point dtype can require gradients");
+  }
+  self_.set_requires_grad(requires_grad);
+  return THPVariable_Wrap(self_);
+  END_HANDLE_TH_ERRORS
+}
+
+static inline bool dispatch_is_contiguous(const Tensor & self, MemoryFormat memory_format) {
+  return self.is_contiguous(memory_format);
+}
+
+// implemented on the python object to avoid dispatch overhead
+static PyObject * THPVariable_is_contiguous(PyObject* self_, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "is_contiguous(*, MemoryFormat memory_format=contiguous_format)",
+  });
+  ParsedArgs<1> parsed_args;
+  auto r = parser.parse(self_, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self_, args, kwargs, PyObject_Type(self_), "torch.Tensor");
+  }
+
+  auto memory_format = r.memoryformat(0);
+  auto& self = THPVariable_Unpack(self_);
+  return wrap(dispatch_is_contiguous(self, memory_format));
+  END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object to avoid dispatch overhead
+static PyObject * THPVariable_item(PyObject* self, PyObject* args)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "item", args);
+  }
+  jit::tracer::warn("Converting a tensor to a Python number", jit::tracer::WARN_PYTHON_DATAFLOW);
+  auto& self_ = THPVariable_Unpack(self);
+  auto dispatch_item_ = [](const Tensor& self) -> at::Scalar {
+    pybind11::gil_scoped_release no_gil;
+    return self.item();
+  };
+  return py::cast(dispatch_item_(self_)).release().ptr();
+  END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object bc no support for first class functions in native_functions.yaml
+// See: ATen/native/README.md for more context
+static PyObject * THPVariable_map_(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({ "map_(Tensor other, PyObject* callable)" });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<2> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  Variable other = r.tensor(0);
+  if (self_.requires_grad() || other.requires_grad()) {
+    throw std::runtime_error(
+        "Can't call map_() on Variable that requires grad. Use "
+        "var.detach().map_() instead.");
+  }
+  TORCH_CHECK(
+      !self_.unsafeGetTensorImpl()->is_python_dispatch() && !other.unsafeGetTensorImpl()->is_python_dispatch(),
+      ".map_ is not supported for tensor subclasses.");
+
+  return THPVariable_Wrap(torch::utils::map_(self_, other, r.pyobject(1)));
+  END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object bc no support for first class functions in native_functions.yaml
+// See: ATen/native/README.md for more context
+static PyObject * THPVariable_map2_(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({ "map2_(Tensor x, Tensor y, PyObject* callable)" });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<3> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  Variable x = r.tensor(0);
+  Variable y = r.tensor(1);
+  if (self_.requires_grad() || x.requires_grad() || y.requires_grad()) {
+    throw std::runtime_error(
+        "Can't call map2_() on Variable that requires grad. Use "
+        "var.detach().map2_() instead.");
+  }
+  TORCH_CHECK(
+      !x.unsafeGetTensorImpl()->is_python_dispatch() && !y.unsafeGetTensorImpl()->is_python_dispatch(),
+      ".map2_ is not supported for tensor subclasses.");
+  return THPVariable_Wrap(torch::utils::map2_(self_, x, y, r.pyobject(2)));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_new(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "new", args, kwargs);
+  }
+  auto& self_ = THPVariable_Unpack(self);
+  OptionalDeviceGuard device_guard(device_of(self_));
+  return THPVariable_Wrap(torch::utils::legacy_tensor_new(legacyExtractDispatchKey(self_), self_.scalar_type(), args, kwargs));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_new_tensor(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "new_tensor", args, kwargs);
+  }
+  auto& self_ = THPVariable_Unpack(self);
+  OptionalDeviceGuard device_guard(device_of(self_));
+  return THPVariable_Wrap(torch::utils::new_tensor(legacyExtractDispatchKey(self_), self_.scalar_type(), args, kwargs));
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_storage(PyObject* self, PyObject* arg)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "untyped_storage");
+  }
+  auto& self_ = THPVariable_Unpack(self);
+  return createPyObject(self_.storage());
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_to(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "to(Device device=None, ScalarType dtype=None, bool non_blocking=False, bool copy=False, *, MemoryFormat? memory_format=None)",
+    "to(ScalarType dtype, bool non_blocking=False, bool copy=False, *, MemoryFormat? memory_format=None)",
+    "to(Tensor tensor, bool non_blocking=False, bool copy=False, *, MemoryFormat? memory_format=None)",
+  });
+  ParsedArgs<5> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+  if (r.has_torch_function()) {
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+  auto parsed = parse_to_conversion(r, /*allow_copy*/ true);
+  auto& device = std::get<0>(parsed);
+  auto& scalarType = std::get<1>(parsed);
+  auto non_blocking = std::get<2>(parsed);
+  auto copy = std::get<3>(parsed);
+  auto opt_memory_format = std::get<4>(parsed);
+  auto& self_ = THPVariable_Unpack(self);
+  torch::utils::maybe_initialize_device(device);
+  if (!device && !scalarType && !copy && !opt_memory_format.has_value()) {
+    Py_INCREF(self);
+    return self;
+  } else if (!device && !scalarType) {
+    return THPVariable_Wrap(
+        dispatch_to(self_, non_blocking, copy, opt_memory_format));
+  } else if (!device) {
+    return THPVariable_Wrap(dispatch_to(self_, *scalarType, non_blocking, copy, opt_memory_format));
+  } else if (!scalarType) {
+    return THPVariable_Wrap(dispatch_to(self_, *device, non_blocking, copy, opt_memory_format));
+  } else {
+    return THPVariable_Wrap(dispatch_to(self_, *device, *scalarType, non_blocking, copy, opt_memory_format));
+  }
+  Py_RETURN_NONE;
+  END_HANDLE_TH_ERRORS
+}
+
+// implemented on the python object b/c arbitrarily nested list not declarable in native_functions.yaml
+// See: ATen/native/README.md for more context
+static PyObject * THPVariable_tolist(PyObject* self, PyObject* args)
+{
+  HANDLE_TH_ERRORS
+  if (check_has_torch_function(self)) {
+    return handle_torch_function(self, "tolist", args);
+  }
+  jit::tracer::warn("Converting a tensor to a Python list", jit::tracer::WARN_PYTHON_DATAFLOW);
+  auto self_ = THPVariable_Unpack(self);
+  return torch::utils::tensor_to_list(self_);
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject * THPVariable_type(PyObject* self, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+  static PythonArgParser parser({
+    "type(PyObject* dtype=None, bool non_blocking=False, *, MemoryFormat? memory_format=None)",
+    "type(PyObject* dtype=None, bool async=False, *, MemoryFormat? memory_format=None)|deprecated"
+  });
+  auto& self_ = THPVariable_Unpack(self);
+  ParsedArgs<3> parsed_args;
+  auto r = parser.parse(self, args, kwargs, parsed_args);
+
+  if(r.has_torch_function()){
+    return handle_torch_function(r, self, args, kwargs, THPVariableClass, "torch.Tensor");
+  }
+
+  if (r.isNone(0)) {
+    return THPUtils_packString(torch::utils::options_to_string(self_.options()));
+  }
+  auto obj = r.pyobject(0);
+  auto opt_memory_format = r.memoryformatOptional(2);
+  std::string type_name;
+  bool is_dtype = false;
+  if (PyType_Check(obj)) {
+    if (obj == THPVariableClass) {
+      type_name = "torch.Tensor";
+    } else {
+      type_name = ((PyTypeObject*)obj)->tp_name;
+    }
+  } else if (THPUtils_checkString(obj)) {
+    type_name = THPUtils_unpackString(obj);
+  } else if (THPDtype_Check(obj)) {
+    is_dtype = true;
+  } else {
+    throw TypeError("dtype must be a type, str, or dtype object");
+  }
+  Device device = self_.device();
+  if (is_dtype) {
+    auto scalar_type = r.scalartype(0);
+    return THPVariable_Wrap(dispatch_to(self_, scalar_type, /*non_blocking=*/ r.toBool(1), /*copy=*/ false, opt_memory_format));
+  }
+  at::TensorOptions options = torch::utils::options_from_string(type_name);
+  auto scalar_type = at::typeMetaToScalarType(options.dtype());
+  auto device_type = options.device().type();
+  if (device_type != device.type()) {
+    device = at::Device(device_type);
+  }
+  torch::utils::maybe_initialize_device(device);
+  return THPVariable_Wrap(dispatch_to(self_, device, scalar_type, /*non_blocking=*/ r.toBool(1), /*copy=*/ false, opt_memory_format));
+  END_HANDLE_TH_ERRORS
+}
+
+// generated methods start here
+
+${py_methods}
+
+static PyObject * THPVariable_bool_scalar(PyObject* self, PyObject* args) {
+  if (check_has_torch_function(self)) {
+    HANDLE_TH_ERRORS
+    return handle_torch_function(self, "__bool__", args);
+    END_HANDLE_TH_ERRORS
+  }
+  jit::tracer::warn("Converting a tensor to a Python boolean", jit::tracer::WARN_PYTHON_DATAFLOW);
+  return THPVariable_is_nonzero(self, args);
+}
+
+static PyObject * THPVariable___eq__(PyObject* self_, PyObject* args, PyObject* kwargs)
+{
+  HANDLE_TH_ERRORS
+#ifdef USE_NUMPY
+  if (torch::utils::is_numpy_available()) {
+    static PythonArgParser parser({
+      "__eq__(PyObject* other)",
+    }, /*traceable=*/true);
+
+    ParsedArgs<1> parsed_args;
+    auto _r = parser.parse(self_, args, kwargs, parsed_args);
+    if(_r.has_torch_function()) {
+      return handle_torch_function(_r, self_, args, kwargs, THPVariableClass, "torch.Tensor");
+    }
+    switch (_r.idx) {
+      case 0: {
+        auto other = _r.pyobject(0);
+        if (PyArray_Check(other)) {
+          auto other_tensor = torch::utils::tensor_from_numpy(other);
+          auto dispatch_eq = [](const at::Tensor & self, const at::Tensor & other) -> at::Tensor {
+            pybind11::gil_scoped_release no_gil;
+            return self.eq(other);
+          };
+          const Tensor& self = THPVariable_Unpack(self_);
+          return wrap(dispatch_eq(self, other_tensor));
+        }
+      }
+    }
+  }
+#endif
+  return THPVariable_eq(self_, args, kwargs);
+  Py_RETURN_NONE;
+  END_HANDLE_TH_ERRORS
+}
+
+// Wrapper converts a raised TypeError into returning NotImplemented
+// Used to implement binary arithmetic operators
+template <PyObject* (*Func)(PyObject*, PyObject*, PyObject*)>
+static PyObject * TypeError_to_NotImplemented_(PyObject* self, PyObject* args, PyObject* kwargs) {
+
+  PyObject* ret = Func(self, args, kwargs);
+  if (!ret && PyErr_ExceptionMatches(PyExc_TypeError)) {
+    PyErr_Clear();
+    Py_INCREF(Py_NotImplemented);
+    ret = Py_NotImplemented;
+  }
+  return ret;
+}
+
+// set_ has to be defined in the template because the c10::Storage object
+// does not have a type, and we need to make sure the Python storage object's
+// type matches the tensor's type
+static PyObject* THPVariable_set_(
+    PyObject* self_,
+    PyObject* args,
+    PyObject* kwargs) {
+  HANDLE_TH_ERRORS
+  const Tensor& self = THPVariable_Unpack(self_);
+  static PythonArgParser parser(
+      {
+          "set_()",
+          "set_(Storage source)",
+          "set_(Storage source, SymInt storage_offset, SymIntArrayRef size, SymIntArrayRef stride=None)",
+          "set_(Tensor source)",
+          "set_(Tensor source, SymInt storage_offset, SymIntArrayRef size, SymIntArrayRef stride=None)",
+      },
+      /*traceable=*/false);
+
+  ParsedArgs<4> parsed_args;
+  auto _r = parser.parse(args, kwargs, parsed_args);
+
+  switch (_r.idx) {
+    case 0: {
+      // aten::set_(Tensor(a!) self) -> Tensor(a!)
+      auto dispatch_set_ = [](const Tensor& self) -> Tensor {
+        pybind11::gil_scoped_release no_gil;
+        return self.set_();
+      };
+      return wrap(dispatch_set_(self));
+    }
+    case 1: {
+      // aten::set_.source_Storage(Tensor(a!) self, Storage source) ->
+      // Tensor(a!)
+      at::ScalarType storage_scalar_type{};
+      bool is_typed_storage = true;
+      at::Storage storage = _r.storage(0, storage_scalar_type, is_typed_storage);
+      TORCH_CHECK(storage_scalar_type == self.dtype() || !is_typed_storage,
+        "Expected a Storage of type ", self.dtype(),
+        " or an UntypedStorage, but got type ", storage_scalar_type,
+        " for argument 1 'storage'");
+      auto dispatch_set_ = [](const Tensor& self, Storage source) -> Tensor {
+        pybind11::gil_scoped_release no_gil;
+        return self.set_(std::move(source));
+      };
+      return wrap(dispatch_set_(self, storage));
+    }
+    case 2: {
+      // aten::set_.source_Storage_storage_offset(Tensor(a!) self, Storage
+      // source, int storage_offset, int[] size, int[] stride=[]) -> Tensor(a!)
+      at::ScalarType storage_scalar_type{};
+      bool is_typed_storage = true;
+      at::Storage storage = _r.storage(0, storage_scalar_type, is_typed_storage);
+      TORCH_CHECK(storage_scalar_type == self.dtype() || !is_typed_storage,
+        "Expected a Storage of type ", self.dtype(),
+        " or an UntypedStorage, but got type ", storage_scalar_type,
+        " for argument 1 'storage'");
+      auto dispatch_set_ = [](const Tensor& self,
+                              Storage source,
+                              c10::SymInt storage_offset,
+                              c10::SymIntArrayRef size,
+                              c10::SymIntArrayRef stride) -> Tensor {
+        pybind11::gil_scoped_release no_gil;
+        return self.set__symint(std::move(source), std::move(storage_offset), size, stride);
+      };
+      return wrap(dispatch_set_(
+          self, storage, _r.toSymInt(1), _r.symintlist(2), _r.symintlist(3)));
+    }
+    case 3: {
+      // aten::set_.source_Tensor(Tensor(a!) self, Tensor source) -> Tensor(a!)
+      auto dispatch_set_ = [](const Tensor& self, const Tensor& source) -> Tensor {
+        TORCH_CHECK(source.dtype() == self.dtype(), "Could not set tensor of type ", source.dtype(), " to a tensor of type ", self.dtype());
+        pybind11::gil_scoped_release no_gil;
+        return self.set_(source);
+      };
+      return wrap(dispatch_set_(self, _r.tensor(0)));
+    }
+    case 4: {
+      // aten::set_.source_Tensor_storage_offset(Tensor(a!) self, Tensor
+      // source, int storage_offset, int[] size, int[] stride=[]) -> Tensor(a!)
+      at::Tensor storage = _r.tensor(0);
+      auto dispatch_set_ = [](const Tensor& self,
+                              const Tensor& source,
+                              c10::SymInt storage_offset,
+                              c10::SymIntArrayRef size,
+                              c10::SymIntArrayRef stride) -> Tensor {
+        pybind11::gil_scoped_release no_gil;
+        return self.set__symint(source, std::move(storage_offset), size, stride);
+      };
+      return wrap(dispatch_set_(
+          self, storage, _r.toSymInt(1), _r.symintlist(2), _r.symintlist(3)));
+    }
+  }
+  Py_RETURN_NONE;
+  END_HANDLE_TH_ERRORS
+}
+
+// XXX: ops that are bound here are not exposed to the C++ api nor the JIT.
+// Any new ops added here should be accompanied with a comment why they are not
+// being registered through native_functions.yaml, and be tagged cpp / JIT
+PyMethodDef variable_methods[] = {
+  // These magic methods are all implemented on python object to wrap NotImplementedError
+  {"__add__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_add>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__radd__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_add>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__iadd__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_add_>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__rmul__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_mul>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__mul__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_mul>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__imul__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_mul_>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__sub__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_sub>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__isub__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_sub_>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__div__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_div>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__truediv__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_div>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__floordiv__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_floor_divide>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__idiv__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_div_>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__ifloordiv__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_floor_divide_>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__mod__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_remainder>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__imod__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_remainder_>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__eq__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable___eq__>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__ne__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_ne>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__lt__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_lt>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__le__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_le>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__gt__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_gt>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__ge__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_ge>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__rand__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_bitwise_and>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__ror__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_bitwise_or>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__rxor__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_bitwise_xor>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"__bool__", THPVariable_bool_scalar, METH_NOARGS, nullptr},
+  {"__float__", THPVariable_float_scalar, METH_NOARGS, nullptr},
+  {"__complex__", THPVariable_complex_scalar, METH_NOARGS, nullptr},
+  {"__int__", THPVariable_integral_scalar, METH_NOARGS, nullptr},
+  {"__long__", THPVariable_integral_scalar, METH_NOARGS, nullptr},
+  {"__index__", THPVariable_index_scalar, METH_NOARGS, nullptr},
+  {"__nonzero__", THPVariable_bool_scalar, METH_NOARGS, nullptr},
+  {"__invert__", THPVariable_invert, METH_NOARGS, nullptr},
+  {"__matmul__", castPyCFunctionWithKeywords(TypeError_to_NotImplemented_<THPVariable_matmul>), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"_is_view", THPVariable__is_view, METH_NOARGS, nullptr},
+  {"apply_", THPVariable_apply_, METH_O, nullptr},
+  {"bfloat16", castPyCFunctionWithKeywords(THPVariable_bfloat16), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"byte", castPyCFunctionWithKeywords(THPVariable_byte), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"char", castPyCFunctionWithKeywords(THPVariable_char), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"contiguous", castPyCFunctionWithKeywords(THPVariable_contiguous), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"copy_", castPyCFunctionWithKeywords(THPVariable_copy_), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"cpu", castPyCFunctionWithKeywords(THPVariable_cpu), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"cuda", castPyCFunctionWithKeywords(THPVariable_cuda), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"mtia", castPyCFunctionWithKeywords(THPVariable_mtia), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"xpu", castPyCFunctionWithKeywords(THPVariable_xpu), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"ipu", castPyCFunctionWithKeywords(THPVariable_ipu), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"data_ptr", THPVariable_data_ptr, METH_NOARGS, nullptr},
+  {"dim", THPVariable_dim, METH_NOARGS, nullptr},
+  {"has_names", THPVariable_has_names, METH_NOARGS, nullptr},
+  {"double", castPyCFunctionWithKeywords(THPVariable_double), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"cdouble", castPyCFunctionWithKeywords(THPVariable_cdouble), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"element_size", THPVariable_element_size, METH_NOARGS, nullptr},
+  {"float", castPyCFunctionWithKeywords(THPVariable_float), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"cfloat", castPyCFunctionWithKeywords(THPVariable_cfloat), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"get_device", THPVariable_get_device, METH_NOARGS, nullptr},
+  {"bool", castPyCFunctionWithKeywords(THPVariable_bool), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"half", castPyCFunctionWithKeywords(THPVariable_half), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"int", castPyCFunctionWithKeywords(THPVariable_int), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"is_contiguous", castPyCFunctionWithKeywords(THPVariable_is_contiguous), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"item", THPVariable_item, METH_NOARGS, nullptr},
+  {"long", castPyCFunctionWithKeywords(THPVariable_long), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"map_", castPyCFunctionWithKeywords(THPVariable_map_), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"map2_", castPyCFunctionWithKeywords(THPVariable_map2_), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"ndimension", THPVariable_dim, METH_NOARGS, nullptr},
+  {"nelement", THPVariable_numel, METH_NOARGS, nullptr},
+  {"new", castPyCFunctionWithKeywords(THPVariable_new), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"new_tensor", castPyCFunctionWithKeywords(THPVariable_new_tensor), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"nonzero", castPyCFunctionWithKeywords(THPVariable_nonzero), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"numel", THPVariable_numel, METH_NOARGS, nullptr},
+  {"numpy", castPyCFunctionWithKeywords(THPVariable_numpy), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"requires_grad_", castPyCFunctionWithKeywords(THPVariable_requires_grad_), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"set_", castPyCFunctionWithKeywords(THPVariable_set_), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"short", castPyCFunctionWithKeywords(THPVariable_short), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"size", castPyCFunctionWithKeywords(THPVariable_size), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"untyped_storage", THPVariable_storage, METH_NOARGS, nullptr},
+  {"storage_offset", THPVariable_storage_offset, METH_NOARGS, nullptr},
+  {"stride", castPyCFunctionWithKeywords(THPVariable_stride), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"to", castPyCFunctionWithKeywords(THPVariable_to), METH_VARARGS | METH_KEYWORDS, nullptr},
+  {"tolist", THPVariable_tolist, METH_NOARGS, nullptr},
+  {"type", castPyCFunctionWithKeywords(THPVariable_type), METH_VARARGS | METH_KEYWORDS, nullptr},
+  ${py_method_defs}
+  {nullptr}
+};
+
+} // namespace torch::autograd
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/variable_factories.h b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/variable_factories.h
new file mode 100644
index 0000000000000000000000000000000000000000..2b55f441ab6249cb7963c5e4a15070f626f775b7
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/packaged/autograd/templates/variable_factories.h
@@ -0,0 +1,135 @@
+#pragma once
+
+// ${generated_comment}
+
+#include <ATen/core/Tensor.h>
+#include <ATen/TracerMode.h>
+#include <ATen/core/grad_mode.h>
+#include <c10/util/ArrayRef.h>
+#include <c10/core/MemoryFormat.h>
+#include <torch/csrc/api/include/torch/detail/TensorDataContainer.h>
+#include <torch/csrc/autograd/variable.h>
+
+#ifndef AT_PER_OPERATOR_HEADERS
+#include <ATen/Functions.h>
+#else
+#include <ATen/ops/from_blob.h>
+$ops_headers
+#endif
+
+#include <functional>
+#include <initializer_list>
+#include <utility>
+
+namespace torch {
+
+/// NOTE: Currently `torch::tensor(...)` doesn't support mixed data types
+/// (i.e. `torch::tensor({{bool, 2.0}})` doesn't work). We might be able to
+/// support it in the future by iterating over all sub-lists to find
+/// the largest data type that can represent all of the elements, or by using
+/// variadic templates.
+///
+/// NOTE: C++ `torch::tensor` with a floating-point type or an `at::ArrayRef` / `std::vector` /
+/// (nested) braced-init-list of floating-point types always produces a tensor of dtype
+/// `torch::get_default_dtype()`, matching Python `torch.tensor` behavior.
+///
+/// NOTE: C++ `torch::tensor` with an integer type or an `at::ArrayRef` / `std::vector` /
+/// (nested) braced-init-list of integer types always produces a tensor of dtype `at::kLong`
+/// (aka. int64_t), matching Python `torch.tensor` behavior.
+///
+/// NOTE: The following dtypes are not supported by `torch::tensor` currently:
+/// - `unsigned int`
+/// - `unsigned long int`
+/// - `unsigned long long int`
+/// - `long long int`
+inline at::Tensor tensor(detail::TensorDataContainer tensor_data_container, const at::TensorOptions& options = {}) {
+  return autograd::make_variable(
+    // note: we remove the requires_grad setting from the TensorOptions because
+    // it is ignored anyways (and we actually have an assertion that it isn't set
+    // which would fail otherwise). We handle requires_grad explicitly here
+    // instead of passing it through to the kernel.
+    tensor_data_container.convert_to_tensor(options.requires_grad(::std::nullopt)),
+    options.requires_grad());
+}
+
+/// A generic deleter function.
+using Deleter = std::function<void(void*)>;
+using at::MemoryFormat;
+
+/// Exposes the given `data` as a `Tensor` without taking ownership of the
+/// original data. `sizes` should specify the shape of the tensor, `strides` the
+/// stride in each dimension. The `deleter` function (a
+/// `std::function<void(void*)>`) will be called on the `data` when the Tensor
+/// data would normally be deallocated. The `TensorOptions` specify additional
+/// configuration options for the returned tensor, such as what type to
+/// interpret the `data` as.
+inline at::Tensor from_blob(
+    void* data,
+    at::IntArrayRef sizes,
+    at::IntArrayRef strides,
+    const Deleter& deleter,
+    const at::TensorOptions& options = at::TensorOptions()) {
+  at::Tensor tensor = ([&]() {
+    at::AutoDispatchBelowAutograd guard;  // TODO: remove
+    at::tracer::impl::NoTracerDispatchMode tracer_guard;
+    return at::from_blob(data, sizes, strides, deleter, options.requires_grad(::std::nullopt));
+  })();
+  return autograd::make_variable(tensor, options.requires_grad());
+}
+
+/// Exposes the given `data` as a `Tensor` without taking ownership of the
+/// original data. `sizes` should specify the shape of the tensor, `strides` the
+/// stride in each dimension. The `TensorOptions`
+/// specify additional configuration options for the returned tensor, such as
+/// what type to interpret the `data` as.
+inline at::Tensor from_blob(
+    void* data,
+    at::IntArrayRef sizes,
+    at::IntArrayRef strides,
+    const at::TensorOptions& options = at::TensorOptions()) {
+  at::Tensor tensor = ([&]() {
+    at::AutoDispatchBelowAutograd guard;  // TODO: remove
+    at::tracer::impl::NoTracerDispatchMode tracer_guard;
+    return at::from_blob(data, sizes, strides, options.requires_grad(::std::nullopt));
+  })();
+  return autograd::make_variable(tensor, options.requires_grad());
+}
+
+/// Exposes the given `data` as a `Tensor` without taking ownership of the
+/// original data. `sizes` should specify the shape of the tensor. The `deleter`
+/// (a `std::function<void(void*)>`) function will be called on the `data` when
+/// the Tensor data would normally be deallocated. The `TensorOptions` specify
+/// additional configuration options for the returned tensor, such as what type
+/// to interpret the `data` as.
+inline at::Tensor from_blob(
+    void* data,
+    at::IntArrayRef sizes,
+    const Deleter& deleter,
+    const at::TensorOptions& options = at::TensorOptions()) {
+  at::Tensor tensor = ([&]() {
+    at::AutoDispatchBelowAutograd guard;  // TODO: remove
+    at::tracer::impl::NoTracerDispatchMode tracer_guard;
+    return at::from_blob(data, sizes, deleter, options.requires_grad(::std::nullopt));
+  })();
+  return autograd::make_variable(tensor, options.requires_grad());
+}
+
+/// Exposes the given `data` as a `Tensor` without taking ownership of the
+/// original data. `sizes` should specify the shape of the tensor. The
+/// `TensorOptions` specify additional configuration options for the returned
+/// tensor, such as what type to interpret the `data` as.
+inline at::Tensor from_blob(
+    void* data,
+    at::IntArrayRef sizes,
+    const at::TensorOptions& options = at::TensorOptions()) {
+  at::Tensor tensor = ([&]() {
+    at::AutoDispatchBelowAutograd guard;  // TODO: remove
+    at::tracer::impl::NoTracerDispatchMode tracer_guard;
+    return at::from_blob(data, sizes, options.requires_grad(::std::nullopt));
+  })();
+  return autograd::make_variable(tensor, options.requires_grad());
+}
+
+${function_definitions}
+
+} // namespace torch
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..2aa9105d79b9e8cacdb74442554d28e207df236d
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__pycache__/operator.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__pycache__/operator.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..8d28bb9ac3e50a3b04b3e2a1acb2a8128c3cd006
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__pycache__/operator.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__pycache__/selector.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__pycache__/selector.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..417b865286f29181b9b7f69c77778a3d729e1c2f
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/__pycache__/selector.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/operator.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/operator.py
new file mode 100644
index 0000000000000000000000000000000000000000..8047f033e3d2b0209e03924b355e94a06eceace6
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/operator.py
@@ -0,0 +1,171 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+
+# This class holds information about a single operator used to determine
+# the outcome of a selective/custom PyTorch build that doesn't include
+# registration code for all the supported operators. This is done to
+# reduce the size of the generated binary so that it can be deployed in
+# situations where binary size comes at a premium.
+#
+@dataclass(frozen=True)
+class SelectiveBuildOperator:
+    # The name of the operator. This includes the aten::, etc... prefix
+    # The operator name may or may not have the overload name. If this
+    # operator name does not specify an overload name, the way to determine
+    # if this entry refers to the family of operators with this base name
+    # or just the operator with this name is to look at the value of the
+    # 'include_all_overloads' flag in this class.
+    name: str
+
+    # True if this is a root operator (i.e. called directly from a
+    # TorchScript model, etc...). An operator is considered to be a
+    # root operator if it is called directly from any one of the models
+    # that this instance of the pytorch library was built for. Hence, it
+    # may not be a root operator in all of the models that are used in
+    # this instance of the pytorch library.
+    is_root_operator: bool
+
+    # Is this operator used for on-device training? If True, then we need to
+    # use the information to generate code in VariableType_N.cpp for registration
+    # of training related operators. Again, this is True if this operator
+    # is used for training in one or more models used by this instance of the
+    # pytorch library.
+    is_used_for_training: bool
+
+    # If True, it indicates that this operator instance (object) refers to an
+    # operator without the overload name and should apply to all overloads
+    # which have this operator name as the base name. This flag is applicable
+    # only for objects that have operator names without a DOT (period) character
+    # in them.
+    #
+    # Note: This flag is a temporary workaround to grandfather in the current
+    # static selective (custom) build mechanism, which largely ignores overload
+    # names when determining whether to select operators for registration
+    # purposes.
+    include_all_overloads: bool
+
+    # Debug Information at the operator level
+    _debug_info: tuple[str, ...] | None
+
+    @staticmethod
+    def from_yaml_dict(
+        op_name: str, op_info: dict[str, object]
+    ) -> SelectiveBuildOperator:
+        allowed_keys = {
+            "name",
+            "is_root_operator",
+            "is_used_for_training",
+            "include_all_overloads",
+            "debug_info",
+        }
+
+        if len(set(op_info.keys()) - allowed_keys) > 0:
+            raise Exception(  # noqa: TRY002
+                "Got unexpected top level keys: {}".format(
+                    ",".join(set(op_info.keys()) - allowed_keys),
+                )
+            )
+
+        if "name" in op_info:
+            assert op_name == op_info["name"]
+
+        is_root_operator = op_info.get("is_root_operator", True)
+        assert isinstance(is_root_operator, bool)
+
+        is_used_for_training = op_info.get("is_used_for_training", True)
+        assert isinstance(is_used_for_training, bool)
+
+        include_all_overloads = op_info.get("include_all_overloads", True)
+        assert isinstance(include_all_overloads, bool)
+
+        debug_info: tuple[str, ...] | None = None
+        if "debug_info" in op_info:
+            di_list = op_info["debug_info"]
+            assert isinstance(di_list, list)
+            debug_info = tuple(str(x) for x in di_list)
+
+        return SelectiveBuildOperator(
+            name=op_name,
+            is_root_operator=is_root_operator,
+            is_used_for_training=is_used_for_training,
+            include_all_overloads=include_all_overloads,
+            _debug_info=debug_info,
+        )
+
+    @staticmethod
+    def from_legacy_operator_name_without_overload(
+        name: str,
+    ) -> SelectiveBuildOperator:
+        return SelectiveBuildOperator(
+            name=name,
+            is_root_operator=True,
+            is_used_for_training=True,
+            include_all_overloads=True,
+            _debug_info=None,
+        )
+
+    def to_dict(self) -> dict[str, object]:
+        ret: dict[str, object] = {
+            "is_root_operator": self.is_root_operator,
+            "is_used_for_training": self.is_used_for_training,
+            "include_all_overloads": self.include_all_overloads,
+        }
+        if self._debug_info is not None:
+            ret["debug_info"] = self._debug_info
+
+        return ret
+
+
+def merge_debug_info(
+    lhs: tuple[str, ...] | None,
+    rhs: tuple[str, ...] | None,
+) -> tuple[str, ...] | None:
+    # Ensure that when merging, each entry shows up just once.
+    if lhs is None and rhs is None:
+        return None
+
+    return tuple(set((lhs or ()) + (rhs or ())))
+
+
+def combine_operators(
+    lhs: SelectiveBuildOperator, rhs: SelectiveBuildOperator
+) -> SelectiveBuildOperator:
+    if str(lhs.name) != str(rhs.name):
+        raise Exception(  # noqa: TRY002
+            f"Expected both arguments to have the same name, but got '{str(lhs.name)}' and '{str(rhs.name)}' instead"
+        )
+
+    return SelectiveBuildOperator(
+        name=lhs.name,
+        # Consider this operator to be a root operator if it is a
+        # root operator in any of the models used in this instance of
+        # the pytorch library.
+        is_root_operator=lhs.is_root_operator or rhs.is_root_operator,
+        # Consider this operator to be a training operator if it is
+        # an operator used for training in any of the models used
+        # in this instance of the pytorch library.
+        is_used_for_training=lhs.is_used_for_training or rhs.is_used_for_training,
+        include_all_overloads=lhs.include_all_overloads or rhs.include_all_overloads,
+        _debug_info=merge_debug_info(lhs._debug_info, rhs._debug_info),
+    )
+
+
+def merge_operator_dicts(
+    lhs: dict[str, SelectiveBuildOperator],
+    rhs: dict[str, SelectiveBuildOperator],
+) -> dict[str, SelectiveBuildOperator]:
+    operators: dict[str, SelectiveBuildOperator] = {}
+    for op_name, op in list(lhs.items()) + list(rhs.items()):
+        new_op = op
+        if op_name in operators:
+            new_op = combine_operators(operators[op_name], op)
+
+        operators[op_name] = new_op
+
+    return operators
+
+
+def strip_operator_overload_name(op_name: str) -> str:
+    return op_name.split(".", maxsplit=1)[0]
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/selector.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/selector.py
new file mode 100644
index 0000000000000000000000000000000000000000..04acc354203ade2f48dcef56fd9d9ef70c82ad1d
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/selective_build/selector.py
@@ -0,0 +1,352 @@
+from __future__ import annotations
+
+from collections import defaultdict
+from collections.abc import Iterable
+from dataclasses import dataclass
+from typing import TYPE_CHECKING
+
+import yaml
+
+from torchgen.selective_build.operator import (
+    merge_debug_info,
+    merge_operator_dicts,
+    SelectiveBuildOperator,
+    strip_operator_overload_name,
+)
+
+
+if TYPE_CHECKING:
+    from torchgen.model import NativeFunction
+
+
+# A SelectiveBuilder holds information extracted from the selective build
+# YAML specification.
+#
+# It includes information about the build's selectivity, the debug_info
+# associated with this selective build (opaque string), and the set of
+# operators that should be included in the build.
+#
+@dataclass(frozen=True)
+class SelectiveBuilder:
+    # If true, then the build is not selective, and includes all
+    # operators.
+    include_all_operators: bool
+
+    # Debug Information at the selective/custom build level.
+    _debug_info: tuple[str, ...] | None
+
+    # A dictionary of operator -> operator metadata.
+    operators: dict[str, SelectiveBuildOperator]
+
+    # A dictionary of selected kernel tags and dtypes. Typically a
+    # PyTorch Operator Kernel (function) may have many code paths
+    # that are specialized for many many Tensor dtypes, so it's not
+    # one per kernel function, but there could be many per kernel
+    # function. The tag isn't a kernel function name, but some fragment
+    # of the kernel function implementation itself.
+    kernel_metadata: dict[str, list[str]]
+
+    # ExecuTorch only. A dictionary of kernel tag -> list of (list of input
+    # dtypes for tensor-like input args).
+    # This is from selective.yaml
+    et_kernel_metadata: dict[str, list[str]]
+
+    # A set of all the custom torch bind classes used by the selected models
+    # Stored as a set internally to remove duplicates proactively, but written
+    # as a list to yamls
+    custom_classes: set[str]
+
+    # A set of all the build features used by the selected models
+    # Stored as a set internally to remove duplicates proactively, but written
+    # as a list to yamls
+    build_features: set[str]
+
+    # If true, then fragments for all dtypes for all kernel functions
+    # are included as well as all custom classes. This is typically set when any one of the
+    # operator lists is generated from a mechanism other than
+    # tracing based selective build.
+    include_all_non_op_selectives: bool
+
+    @staticmethod
+    def get_nop_selector() -> SelectiveBuilder:
+        return SelectiveBuilder.from_yaml_dict({"include_all_operators": True})
+
+    @staticmethod
+    def from_yaml_dict(data: dict[str, object]) -> SelectiveBuilder:
+        valid_top_level_keys = {
+            "include_all_non_op_selectives",
+            "include_all_operators",
+            "debug_info",
+            "operators",
+            "kernel_metadata",
+            "et_kernel_metadata",
+            "custom_classes",
+            "build_features",
+        }
+        top_level_keys = set(data.keys())
+        if len(top_level_keys - valid_top_level_keys) > 0:
+            raise Exception(  # noqa: TRY002
+                "Got unexpected top level keys: {}".format(
+                    ",".join(top_level_keys - valid_top_level_keys),
+                )
+            )
+        include_all_operators = data.get("include_all_operators", False)
+        assert isinstance(include_all_operators, bool)
+
+        debug_info = None
+        if "debug_info" in data:
+            di_list = data["debug_info"]
+            assert isinstance(di_list, list)
+
+            debug_info = tuple(str(x) for x in di_list)
+
+        operators = {}
+        operators_dict = data.get("operators", {})
+        assert isinstance(operators_dict, dict)
+
+        for k, v in operators_dict.items():
+            operators[k] = SelectiveBuildOperator.from_yaml_dict(k, v)
+
+        kernel_metadata = {}
+        kernel_metadata_dict = data.get("kernel_metadata", {})
+        assert isinstance(kernel_metadata_dict, dict)
+
+        for k, v in kernel_metadata_dict.items():
+            kernel_metadata[str(k)] = [str(dtype) for dtype in v]
+
+        et_kernel_metadata = data.get("et_kernel_metadata", {})
+        assert isinstance(et_kernel_metadata, dict)
+
+        custom_classes = data.get("custom_classes", [])
+        assert isinstance(custom_classes, Iterable)
+        custom_classes = set(custom_classes)
+
+        build_features = data.get("build_features", [])
+        assert isinstance(build_features, Iterable)
+        build_features = set(build_features)
+
+        include_all_non_op_selectives = data.get("include_all_non_op_selectives", False)
+        assert isinstance(include_all_non_op_selectives, bool)
+
+        return SelectiveBuilder(
+            include_all_operators,
+            debug_info,
+            operators,
+            kernel_metadata,
+            et_kernel_metadata,
+            custom_classes,  # type: ignore[arg-type]
+            build_features,  # type: ignore[arg-type]
+            include_all_non_op_selectives,
+        )
+
+    @staticmethod
+    def from_yaml_str(config_contents: str) -> SelectiveBuilder:
+        contents = yaml.safe_load(config_contents)
+        return SelectiveBuilder.from_yaml_dict(contents)
+
+    @staticmethod
+    def from_yaml_path(config_path: str) -> SelectiveBuilder:
+        with open(config_path) as f:
+            contents = yaml.safe_load(f)
+            return SelectiveBuilder.from_yaml_dict(contents)
+
+    @staticmethod
+    def from_legacy_op_registration_allow_list(
+        allow_list: set[str], is_root_operator: bool, is_used_for_training: bool
+    ) -> SelectiveBuilder:
+        operators = {}
+        for op in allow_list:
+            operators[op] = {
+                "name": op,
+                "is_root_operator": is_root_operator,
+                "is_used_for_training": is_used_for_training,
+                "include_all_overloads": True,
+            }
+        return SelectiveBuilder.from_yaml_dict(
+            {
+                "operators": operators,
+                "include_all_non_op_selectives": True,
+            }
+        )
+
+    def is_operator_selected(self, name: str) -> bool:
+        if self.include_all_operators:
+            return True
+
+        if name in self.operators:
+            return True
+        name = strip_operator_overload_name(name)
+        return name in self.operators and self.operators[name].include_all_overloads
+
+    def is_native_function_selected(self, func: NativeFunction) -> bool:
+        op_name = op_name_from_native_function(func)
+        return self.is_operator_selected(op_name)
+
+    def is_operator_selected_for_training(self, name: str) -> bool:
+        if not self.is_operator_selected(name):
+            return False
+        if self.include_all_operators:
+            return True
+
+        not_training_op = SelectiveBuildOperator(
+            name="",
+            is_root_operator=False,
+            is_used_for_training=False,
+            include_all_overloads=False,
+            _debug_info=None,
+        )
+        op = not_training_op
+        if name in self.operators:
+            op = self.operators[name]
+
+        name = strip_operator_overload_name(name)
+        base_op = not_training_op
+        if name in self.operators:
+            base_op = self.operators[name]
+
+        return op.is_used_for_training or (
+            base_op.include_all_overloads and base_op.is_used_for_training
+        )
+
+    def is_native_function_selected_for_training(self, func: NativeFunction) -> bool:
+        op_name = op_name_from_native_function(func)
+        return self.is_operator_selected_for_training(op_name)
+
+    def is_root_operator(self, name: str) -> bool:
+        if not self.is_operator_selected(name):
+            return False
+        if self.include_all_operators:
+            return True
+
+        if name in self.operators:
+            op: SelectiveBuildOperator = self.operators[name]
+            return op.is_root_operator
+        name = strip_operator_overload_name(name)
+        if name not in self.operators:
+            return False
+        base_op: SelectiveBuildOperator = self.operators[name]
+        return base_op.include_all_overloads and base_op.is_root_operator
+
+    def is_kernel_dtype_selected(self, kernel_tag: str, dtype: str) -> bool:
+        if self.include_all_operators or self.include_all_non_op_selectives:
+            return True
+
+        return (
+            kernel_tag in self.kernel_metadata
+            and dtype in self.kernel_metadata[kernel_tag]
+        )
+
+    def et_get_selected_kernels(self, op_name: str, kernel_key: list[str]) -> list[str]:
+        """
+        Return a list of kernel keys that cover the used ops
+        """
+        # If no kernel metadata, either it's implied by include_all_operators=True or the op is not used.
+        if op_name not in self.et_kernel_metadata:
+            return kernel_key if self.include_all_operators else []
+        # Otherwise, only return the specific kernel keys.
+
+        result_set = set()
+
+        for model_kernel_keys in self.et_kernel_metadata[op_name]:
+            key_found = False
+            for key in kernel_key:
+                # Don't compare the version for now
+                if (
+                    key != "default"
+                    and key.split("/")[1] == model_kernel_keys.split("/")[1]
+                ):
+                    result_set.add(key)
+                    key_found = True
+                    break
+            if not key_found:
+                if "default" not in kernel_key:
+                    raise Exception("Missing kernel for the model")  # noqa: TRY002
+                else:
+                    result_set.add("default")
+
+        return list(result_set)
+
+    def to_dict(self) -> dict[str, object]:
+        ret: dict[str, object] = {
+            "include_all_non_op_selectives": self.include_all_non_op_selectives,
+            "include_all_operators": self.include_all_operators,
+        }
+        operators = {}
+        for op_name, op in self.operators.items():
+            operators[op_name] = op.to_dict()
+        ret["operators"] = operators
+
+        if self._debug_info is not None:
+            ret["debug_info"] = sorted(self._debug_info)
+
+        ret["kernel_metadata"] = {
+            k: sorted(v) for (k, v) in self.kernel_metadata.items()
+        }
+
+        ret["et_kernel_metadata"] = self.et_kernel_metadata
+
+        ret["custom_classes"] = sorted(self.custom_classes)
+
+        ret["build_features"] = sorted(self.build_features)
+
+        return ret
+
+
+def merge_kernel_metadata(
+    lhs: dict[str, list[str]],
+    rhs: dict[str, list[str]],
+) -> dict[str, list[str]]:
+    kernel_metadata: dict[str, list[str]] = {}
+    for tag_name, dtypes in list(lhs.items()) + list(rhs.items()):
+        dtypes_copy = set(dtypes)
+        if tag_name in kernel_metadata:
+            dtypes_copy |= set(kernel_metadata[tag_name])
+
+        kernel_metadata[tag_name] = list(dtypes_copy)
+
+    return kernel_metadata
+
+
+def merge_et_kernel_metadata(
+    lhs: dict[str, list[str]],
+    rhs: dict[str, list[str]],
+) -> dict[str, list[str]]:
+    merge_et_kernel_metadata: dict[str, set[str]] = defaultdict(set)
+    for op in list(lhs.keys()) + list(rhs.keys()):
+        merge_et_kernel_metadata[op].update(lhs.get(op, []))
+        merge_et_kernel_metadata[op].update(rhs.get(op, []))
+
+    return {op: sorted(val) for op, val in merge_et_kernel_metadata.items()}
+
+
+def combine_selective_builders(
+    lhs: SelectiveBuilder, rhs: SelectiveBuilder
+) -> SelectiveBuilder:
+    include_all_operators = lhs.include_all_operators or rhs.include_all_operators
+    debug_info = merge_debug_info(lhs._debug_info, rhs._debug_info)
+    operators = merge_operator_dicts(lhs.operators, rhs.operators)
+    kernel_metadata = merge_kernel_metadata(lhs.kernel_metadata, rhs.kernel_metadata)
+    et_kernel_metadata = merge_et_kernel_metadata(
+        lhs.et_kernel_metadata, rhs.et_kernel_metadata
+    )
+    include_all_non_op_selectives = (
+        lhs.include_all_non_op_selectives or rhs.include_all_non_op_selectives
+    )
+    custom_classes = lhs.custom_classes.union(rhs.custom_classes)
+    build_features = lhs.build_features.union(rhs.build_features)
+    return SelectiveBuilder(
+        include_all_operators,
+        debug_info,
+        operators,
+        kernel_metadata,
+        et_kernel_metadata,
+        custom_classes,
+        build_features,
+        include_all_non_op_selectives,
+    )
+
+
+def op_name_from_native_function(f: NativeFunction) -> str:
+    # This was originally read from the 'operator_name_with_overload' field in the
+    # declaration dict, which was the part before the first '(' in 'schema_string'.
+    return f"{f.namespace}::{f.func.name}"
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..cba6d2574c6438c93c848f32fbe008ba078946e3
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/config.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/config.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e1da6a2490bfc869893c310dd35e514f0ab95ff8
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/config.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/gen_static_runtime_ops.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/gen_static_runtime_ops.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f6872a43f198969958e534a307b3a543a14648e1
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/gen_static_runtime_ops.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/generator.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/generator.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f2b2a8f35f770e00c877b06b84f6de0ea26e081b
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/__pycache__/generator.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/config.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/config.py
new file mode 100644
index 0000000000000000000000000000000000000000..9fe129f9754dd83a136fbf9dc4478e04a2242efa
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/config.py
@@ -0,0 +1,388 @@
+from __future__ import annotations
+
+from torchgen.model import NativeFunctionsGroup, NativeFunctionsViewGroup
+
+
+def func_name_base_str(g: NativeFunctionsGroup | NativeFunctionsViewGroup) -> str:
+    if isinstance(g, NativeFunctionsGroup):
+        return str(g.functional.func.name.name.base)
+    else:
+        return str(g.view.root_name)
+
+
+is_hand_written_ops_ = frozenset(
+    (
+        "abs",
+        "add",
+        "addmm",
+        "all",
+        "any",
+        "argmin",
+        "bmm",
+        "clamp",
+        "clamp_min",
+        "cumsum",
+        "div",
+        "fmod",
+        "index_select",
+        "leaky_relu",
+        "linear",
+        "log",
+        "matmul",
+        "mul",
+        "narrow_copy",
+        "nonzero",
+        "pow",
+        "remainder",
+        "sigmoid",
+        "sign",
+        "sub",
+        "tanh",
+        "detach",
+        "expand_as",
+        "flatten",
+        "narrow",
+        "reshape_as",
+        "select",
+        "slice",
+        "softmax",
+        "split",
+        "squeeze",
+        "transpose",
+        "view",
+        "where",
+    )
+)
+
+
+def is_hand_written(g: NativeFunctionsGroup | NativeFunctionsViewGroup) -> bool:
+    name_base = func_name_base_str(g)
+    return name_base in is_hand_written_ops_
+
+
+def override_test_values(arg_map: dict[str, str], op_name: str, index: int) -> None:
+    assert index == 0 or index == 1
+    if op_name == "addr":
+        if index == 0:
+            arg_map["self"] = "at::rand({6, 6})"
+            arg_map["vec1"] = "at::rand({6})"
+            arg_map["vec2"] = "at::rand({6})"
+        else:
+            arg_map["self"] = "at::rand({22, 22})"
+            arg_map["vec1"] = "at::rand({22})"
+            arg_map["vec2"] = "at::rand({22})"
+        return
+    if op_name == "mv":
+        if index == 0:
+            arg_map["self"] = "at::rand({6, 6})"
+            arg_map["vec"] = "at::rand({6})"
+        else:
+            arg_map["self"] = "at::rand({22, 22})"
+            arg_map["vec"] = "at::rand({22})"
+        return
+    if op_name == "addbmm":
+        if index == 0:
+            arg_map["self"] = "at::rand({6, 6})"
+        else:
+            arg_map["self"] = "at::rand({22, 22})"
+        return
+    if op_name == "cross":
+        if index == 0:
+            arg_map["self"] = "at::rand({3, 3, 3})"
+            arg_map["other"] = "at::rand({3, 3, 3})"
+        else:
+            arg_map["self"] = "at::rand({22, 3, 22})"
+            arg_map["other"] = "at::rand({22, 3, 22})"
+        return
+    if op_name == "take":
+        if index == 0:
+            arg_map["index"] = "at::randint(0, 216, {20}, torch::kInt64)"
+        else:
+            arg_map["index"] = "at::randint(0, 1000, {100}, torch::kInt64)"
+        return
+    if op_name == "take_along_dim":
+        if index == 0:
+            arg_map["indices"] = "at::argsort(self0, 1, true)"
+        else:
+            arg_map["indices"] = "at::argsort(self1, 1, true)"
+        return
+    if op_name == "masked_select":
+        if index == 0:
+            arg_map["mask"] = "at::randn({6, 6, 6}) > 0.5"
+        else:
+            arg_map["mask"] = "at::rand({22, 22, 22}) > 0.5"
+        return
+    if op_name == "orgqr":
+        if index == 0:
+            arg_map["input2"] = "at::rand({6, 6})"
+        else:
+            arg_map["input2"] = "at::rand({22, 22})"
+        return
+    if op_name == "ormqr":
+        if index == 0:
+            arg_map["input2"] = "at::rand({6, 6})"
+        else:
+            arg_map["input2"] = "at::rand({22, 22})"
+        return
+    if op_name == "quantile":
+        if index == 0:
+            arg_map["q"] = "at::rand({6})"
+            arg_map["interpolation"] = '"linear"'
+        else:
+            arg_map["q"] = "at::rand({22})"
+            arg_map["interpolation"] = '"linear"'
+        return
+    if op_name == "nanquantile":
+        if index == 0:
+            arg_map["q"] = "at::rand({6})"
+            arg_map["interpolation"] = '"linear"'
+        else:
+            arg_map["q"] = "at::rand({22})"
+            arg_map["interpolation"] = '"linear"'
+        return
+    if op_name == "multi_margin_loss":
+        if index == 0:
+            arg_map["self"] = "at::rand({6, 6})"
+            arg_map["target"] = "at::randint(6, {6}, torch::kInt64)"
+            arg_map["weight"] = "at::rand({6})"
+        else:
+            arg_map["self"] = "at::rand({22, 22})"
+            arg_map["target"] = "at::randint(22, {22}, torch::kInt64)"
+            arg_map["weight"] = "at::rand({22})"
+        return
+    if op_name == "multilabel_margin_loss":
+        if index == 0:
+            arg_map["self"] = "at::rand({6, 6})"
+            arg_map["target"] = "at::randint(6, {6, 6}, torch::kInt64)"
+        else:
+            arg_map["self"] = "at::rand({22, 22})"
+            arg_map["target"] = "at::randint(22, {22, 22}, torch::kInt64)"
+        return
+    if op_name == "nll_loss":
+        if index == 0:
+            arg_map["self"] = "at::rand({6, 6})"
+            arg_map["target"] = "at::randint(6, {6}, torch::kInt64)"
+            arg_map["weight"] = "at::rand({6})"
+        else:
+            arg_map["self"] = "at::rand({22, 22})"
+            arg_map["target"] = "at::randint(22, {22}, torch::kInt64)"
+            arg_map["weight"] = "at::rand({22})"
+        return
+    if op_name == "nll_loss2d":
+        if index == 0:
+            arg_map["self"] = "at::rand({6, 6, 6, 6})"
+            arg_map["target"] = "at::randint(6, {6, 6, 6}, torch::kInt64)"
+            arg_map["weight"] = "at::rand({6})"
+        else:
+            arg_map["self"] = "at::rand({22, 22, 22, 22})"
+            arg_map["target"] = "at::randint(22, {22, 22, 22}, torch::kInt64)"
+            arg_map["weight"] = "at::rand({22})"
+        return
+    if op_name in (
+        "fft_fft",
+        "fft_ifft",
+        "fft_rfft",
+        "fft_irfft",
+        "fft_hfft",
+        "fft_ihfft",
+    ):
+        arg_map["norm"] = '"forward"'
+        return
+    if op_name == "linalg_tensorinv":
+        if index == 0:
+            arg_map["self"] = "at::rand({6, 6, 6, 6})"
+            arg_map["ind"] = "2"
+        else:
+            arg_map["self"] = "at::rand({22, 22, 22, 22})"
+            arg_map["ind"] = "2"
+        return
+    if op_name == "addmv":
+        if index == 0:
+            arg_map["self"] = "at::rand({2})"
+            arg_map["mat"] = "at::rand({2, 2})"
+            arg_map["vec"] = "at::rand({2})"
+        else:
+            arg_map["self"] = "at::rand({35})"
+            arg_map["mat"] = "at::rand({35, 35})"
+            arg_map["vec"] = "at::rand({35})"
+        return
+    if op_name == "acosh":
+        if index == 0:
+            arg_map["self"] = "at::rand({2, 2, 2}) + at::ones({2, 2, 2})"
+        else:
+            arg_map["self"] = "at::rand({5, 5, 5}) + at::ones({5, 5, 5})"
+        return
+    if op_name == "adaptive_max_pool2d_backward":
+        if index == 0:
+            arg_map["grad_output"] = "at::rand({2, 2, 2}, at::kFloat)"
+            arg_map["self"] = "at::rand({2, 2, 2}, at::kFloat)"
+            arg_map["indices"] = "at::randint(0, 1, {2, 2, 2}, at::kLong)"
+        else:
+            arg_map["grad_output"] = "at::rand({3, 3, 3}, at::kFloat)"
+            arg_map["self"] = "at::rand({3, 3, 3}, at::kFloat)"
+            arg_map["indices"] = "at::randint(0, 1, {3, 3, 3}, at::kLong)"
+        return
+    if op_name == "adaptive_max_pool3d_backward":
+        if index == 0:
+            arg_map["grad_output"] = "at::rand({2, 2, 2, 2}, at::kFloat)"
+            arg_map["self"] = "at::rand({2, 2, 2, 2}, at::kFloat)"
+            arg_map["indices"] = "at::randint(0, 1, {2, 2, 2, 2}, at::kLong)"
+        else:
+            arg_map["grad_output"] = "at::rand({3, 3, 3, 3}, at::kFloat)"
+            arg_map["self"] = "at::rand({3, 3, 3, 3}, at::kFloat)"
+            arg_map["indices"] = "at::randint(0, 1, {3, 3, 3, 3}, at::kLong)"
+        return
+    if op_name == "bitwise_left_shift":
+        if index == 0:
+            arg_map["self"] = "at::randint(1, 1 << 4, {6, 6, 6}, at::kInt)"
+            arg_map["other"] = "at::randint(1, 26, {6, 6, 6}, at::kInt)"
+        else:
+            arg_map["self"] = "at::randint(1, 1 << 4, {22, 22, 22}, at::kInt)"
+            arg_map["other"] = "at::randint(1, 26, {22, 22, 22}, at::kInt)"
+        return
+    if op_name == "bitwise_right_shift":
+        if index == 0:
+            arg_map["self"] = "at::randint(1 << 21, 1 << 30, {6, 6, 6}, at::kInt)"
+            arg_map["other"] = "at::randint(1, 22, {6, 6, 6}, at::kInt)"
+        else:
+            arg_map["self"] = "at::randint(1 << 21, 1 << 30, {22, 22, 22}, at::kInt)"
+            arg_map["other"] = "at::randint(1, 22, {22, 22, 22}, at::kInt)"
+        return
+    if op_name == "gather":
+        if index == 0:
+            arg_map["self"] = "at::randint(1, 100, {2,2,2}, at::kInt)"
+            arg_map["dim"] = "1"
+            arg_map["index"] = "at::randint(0, 1, {2,2,2}, torch::kInt64)"
+            arg_map["sparse_grad"] = "false"
+        else:
+            arg_map["self"] = "at::randint(1, 100, {5,5,5}, at::kInt)"
+            arg_map["dim"] = "1"
+            arg_map["index"] = "at::randint(0, 4, {5,5,5}, torch::kInt64)"
+            arg_map["sparse_grad"] = "false"
+        return
+    if op_name == "gelu":
+        if index == 0:
+            arg_map["self"] = "at::rand({6, 6, 6})"
+            arg_map["approximate"] = '"tanh"'
+        else:
+            arg_map["self"] = "at::rand({22, 22, 22})"
+            arg_map["approximate"] = '"tanh"'
+        return
+    if op_name == "gelu_backward":
+        if index == 0:
+            arg_map["grad_output"] = "at::rand({6, 6, 6})"
+            arg_map["self"] = "at::rand({6, 6, 6})"
+            arg_map["approximate"] = '"tanh"'
+        else:
+            arg_map["grad_output"] = "at::rand({22, 22, 22})"
+            arg_map["self"] = "at::rand({22, 22, 22})"
+            arg_map["approximate"] = '"tanh"'
+        return
+    if op_name == "index_add":
+        if index == 0:
+            arg_map["self"] = "at::rand({2})"
+            arg_map["dim"] = "0"
+            arg_map["index"] = "at::randint(0, 1, {2}, at::kInt)"
+            arg_map["source"] = "at::rand({2})"
+            arg_map["alpha"] = "2"
+        else:
+            arg_map["self"] = "at::rand({16})"
+            arg_map["dim"] = "0"
+            arg_map["index"] = "at::randint(0, 10, {16}, at::kInt)"
+            arg_map["source"] = "at::rand({16})"
+            arg_map["alpha"] = "2"
+        return
+    if op_name == "index_copy":
+        if index == 0:
+            arg_map["self"] = "at::rand({2})"
+            arg_map["dim"] = "0"
+            arg_map["index"] = "at::randint(0, 1, {2}, at::kLong)"
+            arg_map["source"] = "at::rand({2})"
+        else:
+            arg_map["self"] = "at::rand({32})"
+            arg_map["dim"] = "0"
+            arg_map["index"] = "at::randint(0, 10, {32}, at::kLong)"
+            arg_map["source"] = "at::rand({32})"
+        return
+    if op_name == "linalg_cross":
+        if index == 0:
+            arg_map["self"] = "at::rand({6, 3, 6})"
+            arg_map["other"] = "at::rand({6, 3, 6})"
+            arg_map["dim"] = "1"
+        else:
+            arg_map["self"] = "at::rand({22, 3, 22})"
+            arg_map["other"] = "at::rand({22, 3, 22})"
+            arg_map["dim"] = "1"
+        return
+    if op_name == "nll_loss_backward":
+        if index == 0:
+            arg_map["grad_output"] = "at::rand({})"
+            arg_map["self"] = "at::rand({6})"
+            arg_map["target"] = "at::randint(0, 5, {6}, torch::kInt64)"
+            arg_map["weight"] = "at::rand({6})"
+            arg_map["reduction"] = "1"
+            arg_map["ignore_index"] = "1"
+            arg_map["total_weight"] = "at::rand({})"
+        else:
+            arg_map["grad_output"] = "at::rand({})"
+            arg_map["self"] = "at::rand({36})"
+            arg_map["target"] = "at::randint(0, 11, {36}, torch::kInt64)"
+            arg_map["weight"] = "at::rand({36})"
+            arg_map["reduction"] = "1"
+            arg_map["ignore_index"] = "1"
+            arg_map["total_weight"] = "at::rand({})"
+        return
+    if op_name in ["scatter", "scatter_add", "_scatter_reduce"]:
+        if index == 0:
+            arg_map["self"] = "at::randint(1, 100, {2,2,2}, torch::kInt64)"
+            arg_map["index"] = "at::randint(0, 1, {2,2,2}, torch::kInt64)"
+            arg_map["src"] = "at::randint(1, 100, {2,2,2}, torch::kInt64)"
+        else:
+            arg_map["self"] = "at::randint(1, 100, {5,5,5}, torch::kInt64)"
+            arg_map["index"] = "at::randint(0, 1, {5,5,5}, torch::kInt64)"
+            arg_map["src"] = "at::randint(1, 100, {5,5,5}, torch::kInt64)"
+        if "reduce" in arg_map:
+            arg_map["reduce"] = '"sum"' if op_name == "_scatter_reduce" else '"add"'
+        return
+    if op_name == "scatter_reduce":
+        arg_map["reduce"] = '"mean"'
+        if index == 0:
+            arg_map["index"] = "at::randint(6, {6, 6, 6}, torch::kInt64)"
+        else:
+            arg_map["index"] = "at::randint(22, {22, 22, 22}, torch::kInt64)"
+        return
+    if op_name == "special_zeta":
+        if index == 0:
+            arg_map["self"] = "at::rand({2,2,2}, at::kDouble) + at::ones({2,2,2})"
+            arg_map["other"] = "at::rand({2,2,2}, at::kDouble) + at::ones({2,2,2})"
+        else:
+            arg_map["self"] = "at::rand({5,5,5}, at::kDouble) + at::ones({5,5,5})"
+            arg_map["other"] = "at::rand({5,5,5}, at::kDouble) + at::ones({5,5,5})"
+        return
+    if op_name == "_convert_indices_from_csr_to_coo":
+        if index == 0:
+            arg_map["crow_indices"] = "torch::tensor({1}, torch::kInt32)"
+            arg_map["col_indices"] = "torch::tensor({0, 1, 0}, torch::kInt32)"
+            arg_map["out_int32"] = "false"
+        else:
+            arg_map["crow_indices"] = "torch::tensor({0}, torch::kInt32)"
+            arg_map["col_indices"] = (
+                "torch::tensor({0, 1, 0, 2, 1, 2, 0, 1, 0, 2, 1, 2}, torch::kInt32)"
+            )
+            arg_map["out_int32"] = "false"
+        return
+    if op_name == "_convert_indices_from_coo_to_csr":
+        if index == 0:
+            arg_map["self"] = "at::randint(0, 3, {2}, at::kInt)"
+            arg_map["size"] = "10"
+            arg_map["out_int32"] = "false"
+        else:
+            arg_map["self"] = "at::randint(0, 3, {12}, at::kInt)"
+            arg_map["size"] = "24"
+            arg_map["out_int32"] = "false"
+        return
+    if op_name in ("diagonal", "linalg_diagonal"):
+        arg_map["offset"] = "0"
+        arg_map["dim1"] = "2"
+        arg_map["dim2"] = "1"
+        return
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/gen_static_runtime_ops.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/gen_static_runtime_ops.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6909bc4d7f67fc13fb9f61e00f4709a4ff5ad4e
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/gen_static_runtime_ops.py
@@ -0,0 +1,231 @@
+from __future__ import annotations
+
+import argparse
+import itertools
+import os
+from typing import TYPE_CHECKING, TypeVar
+
+from libfb.py.log import set_simple_logging  # type: ignore[import]
+
+from torchgen import gen
+from torchgen.context import native_function_manager
+from torchgen.model import DispatchKey, NativeFunctionsGroup, NativeFunctionsViewGroup
+from torchgen.static_runtime import config, generator
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+
+# Given a list of `grouped_native_functions` sorted by their op names, return a list of
+# lists each of which groups ops that share the base name. For example, `mean` and
+# `mean.dim` are grouped together by this function.
+
+NativeGroupT = TypeVar(
+    "NativeGroupT",
+    bound=NativeFunctionsGroup | NativeFunctionsViewGroup,
+)
+
+
+def group_functions_by_op_name(
+    grouped_native_functions: Sequence[NativeGroupT],
+) -> Sequence[Sequence[NativeGroupT]]:
+    if not grouped_native_functions:
+        return []
+    groups = []
+
+    def is_supported(g: NativeFunctionsGroup | NativeFunctionsViewGroup) -> bool:
+        with native_function_manager(g):
+            return generator.is_supported(g)
+
+    eligible_ops = (g for g in grouped_native_functions if is_supported(g))
+    groups = [
+        list(group)
+        for k, group in (
+            itertools.groupby(
+                eligible_ops,
+                key=config.func_name_base_str,
+            )
+        )
+    ]
+
+    return groups
+
+
+def clang_format(cpp_file_path: str) -> None:
+    import subprocess
+
+    subprocess.check_call(["clang-format", "-i", cpp_file_path])
+
+
+def write_cpp(cpp_ops: Sequence[str], file_path: str) -> None:
+    code = "\n".join(cpp_ops)
+    generated = f"""// @lint-ignore-every CLANGTIDY HOWTOEVEN
+// AUTO-GENERATED FROM: torchgen/static_runtime/gen_static_runtime_ops.py
+#include <torch/csrc/jit/runtime/static/ops.h>
+
+#include <ATen/CPUFunctions.h>
+#include <ATen/InferSize.h>
+#include <ATen/NativeFunctions.h>
+#include <ATen/Parallel.h>
+#include <ATen/ScalarOps.h>
+#include <ATen/TensorUtils.h>
+#include <ATen/cpu/vec/functional.h>
+#include <ATen/cpu/vec/vec.h>
+#include <ATen/native/EmbeddingBag.h>
+#include <ATen/native/Fill.h>
+#include <ATen/native/IndexingUtils.h>
+#include <ATen/native/NonSymbolicBC.h>
+#include <ATen/native/Resize.h>
+#include <ATen/native/SharedReduceOps.h>
+#include <ATen/native/TensorAdvancedIndexing.h>
+#include <ATen/native/cpu/SerialStackImpl.h>
+#include <ATen/native/layer_norm.h>
+#include <ATen/native/quantized/cpu/fbgemm_utils.h>
+#include <ATen/native/quantized/cpu/qembeddingbag.h>
+#include <ATen/native/quantized/cpu/qembeddingbag_prepack.h>
+#include <ATen/quantized/QTensorImpl.h>
+#include <ATen/quantized/Quantizer.h>
+#include <c10/core/ScalarType.h>
+#include <c10/core/WrapDimMinimal.h>
+#include <c10/util/irange.h>
+#include <torch/csrc/jit/ir/ir.h>
+#include <torch/csrc/jit/runtime/static/impl.h>
+#include <torch/csrc/jit/runtime/static/te_wrapper.h>
+#include <torch/csrc/jit/runtime/vararg_functions.h>
+#include <torch/csrc/jit/tensorexpr/ir.h>
+#include <torch/csrc/jit/tensorexpr/ir_simplifier.h>
+#include <torch/csrc/jit/tensorexpr/llvm_codegen.h>
+#include <torch/csrc/jit/tensorexpr/loopnest.h>
+
+namespace torch {{
+namespace jit {{
+
+{code}
+
+}} // namespace jit
+}} // namespace torch
+"""
+    with open(file_path, "w") as f:
+        f.write(generated)
+    clang_format(file_path)
+
+
+def write_test_cpp(cpp_ops: Sequence[str], file_path: str) -> None:
+    code = "\n".join(cpp_ops)
+    generated = f"""// @lint-ignore-every CLANGTIDY HOWTOEVEN
+// AUTO-GENERATED FROM: torchgen/static_runtime/gen_static_runtime_ops.py
+#include <gtest/gtest.h>
+#include <torch/csrc/jit/runtime/static/impl.h>
+#include <torch/torch.h>
+
+#include "test_utils.h"
+
+using namespace caffe2;
+using namespace torch;
+using namespace torch::jit;
+using namespace torch::jit::test;
+using c10::IValue;
+
+{code}
+
+"""
+    with open(file_path, "w") as f:
+        f.write(generated)
+    clang_format(file_path)
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Generate ATen source files")
+    parser.add_argument(
+        "-s",
+        "--source-path",
+        help="path to source directory for ATen",
+        default="caffe2/aten/src/ATen",
+    )
+    parser.add_argument(
+        "-p",
+        "--generated-ops-cpp-path",
+        help="path to directory to generate op dispatcher .cpp file",
+        default="caffe2/torch/csrc/jit/runtime/static/generated_ops.cpp",
+    )
+    parser.add_argument(
+        "-t",
+        "--generated-ops-test-cpp-path",
+        help="path to directory to generate op dispatcher .cpp file",
+        default="caffe2/benchmarks/static_runtime/test_generated_ops.cc",
+    )
+    options = parser.parse_args()
+    native_yaml_path = os.path.join(options.source_path, "native/native_functions.yaml")
+    tags_yaml_path = os.path.join(options.source_path, "native/tags.yaml")
+    parsed_yaml = gen.parse_native_yaml(native_yaml_path, tags_yaml_path)
+    native_functions, backend_indices = (
+        parsed_yaml.native_functions,
+        parsed_yaml.backend_indices,
+    )
+
+    op_generator = generator.GenOpDispatcher()
+    test_case_generator = generator.GenOpTestCase()
+
+    native_functions_groups = [
+        g
+        for g in gen.get_grouped_native_functions(native_functions)
+        if isinstance(g, NativeFunctionsGroup)
+    ]
+
+    supported_functions_groups = group_functions_by_op_name(native_functions_groups)
+
+    out_variant_op_result = [
+        op_generator.out_variant(groups, backend_indices[DispatchKey.CPU])
+        for groups in supported_functions_groups
+    ]
+    out_variant_test_result = [
+        test_case_generator.out_variant(groups) for groups in supported_functions_groups
+    ]
+
+    native_functions_view_groups = [
+        g
+        for g in gen.get_grouped_by_view_native_functions(native_functions)
+        if isinstance(g, NativeFunctionsViewGroup)
+    ]
+
+    supported_functions_view_groups = group_functions_by_op_name(
+        native_functions_view_groups
+    )
+
+    view_op_result = [
+        op_generator.view(groups, backend_indices[DispatchKey.CPU])
+        for groups in supported_functions_view_groups
+    ]
+    view_test_result = [
+        test_case_generator.view(groups) for groups in supported_functions_view_groups
+    ]
+
+    op_result = out_variant_op_result + ["\n\n"] + view_op_result
+    test_result = out_variant_test_result + ["\n\n"] + view_test_result
+
+    write_cpp(op_result, options.generated_ops_cpp_path)
+    write_test_cpp(test_result, options.generated_ops_test_cpp_path)
+
+    print(
+        f"\ntotal grouped native ops: {len(gen.get_grouped_native_functions(native_functions)):d}"
+    )
+
+    print(f"grouped native ops with out variant: {len(native_functions_groups):d}")
+    supported_functions_num = sum(len(groups) for groups in supported_functions_groups)
+    print(f"generated functions groups with out variant: {supported_functions_num:d}")
+
+    print(f"\nview grouped native ops: {len(native_functions_view_groups):d}")
+    supported_view_functions_num = sum(
+        len(groups) for groups in supported_functions_view_groups
+    )
+    print(f"generated functions view groups: {supported_view_functions_num:d}")
+
+    print(
+        f"\noverall generated : {supported_functions_num + supported_view_functions_num:d}"
+    )
+
+
+if __name__ == "__main__":
+    set_simple_logging(escape_newlines=False)
+    main()
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/generator.py b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..8ad2fd3c458892568429f86e5cd53c26982b38fd
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/torchgen/static_runtime/generator.py
@@ -0,0 +1,814 @@
+from __future__ import annotations
+
+import json
+import logging
+import math
+from typing import TYPE_CHECKING
+
+import torchgen.api.cpp as cpp
+from torchgen.context import native_function_manager
+from torchgen.model import (
+    Argument,
+    BackendIndex,
+    BaseTy,
+    BaseType,
+    FunctionSchema,
+    NativeFunctionsGroup,
+    NativeFunctionsViewGroup,
+    OptionalType,
+    SelfArgument,
+    TensorOptionsArguments,
+    Type,
+)
+from torchgen.static_runtime import config
+
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+
+logger: logging.Logger = logging.getLogger()
+
+
+def has_alias(
+    arguments: Sequence[Argument | SelfArgument | TensorOptionsArguments],
+) -> bool:
+    for arg in arguments:
+        annotation = getattr(arg, "annotation", None)
+        if not annotation:
+            continue
+        alias_set = getattr(annotation, "alias_set", ())
+        if alias_set:
+            return True
+    return False
+
+
+BLOCKED_OPS = frozenset(
+    (
+        # non cpu ops
+        "sparse_sampled_addmm",
+        "hspmm",
+        "linalg_svdvals",
+        # sparse ops
+        "sspaddmm",
+        "coalesce",
+        "_indices",
+        "indices",
+        "_values",
+        "values",
+        "crow_indices",
+        "col_indices",
+        # deprecated ops
+        "floor_divide",
+        "ger",
+        # buggy ops
+        "conj_physical",  # P495807361
+        "binary_cross_entropy",  # P496394764
+        "arccosh",
+        # uncommon ops
+        "cholesky",
+        "lu_solve",
+        "linalg_cholesky",
+        "linalg_householder_product",
+        "linalg_ldl_solve",
+        "_compute_linear_combination",
+        # training related ops
+        "_make_dual",
+        # cannot call directly
+        "_fw_primal",
+        # no documentation
+        "_index_reduce",
+        # TODO: these ones got added recently and need manual inspection
+        "_new_zeros_with_same_feature_meta",
+        "_conj_physical",
+        "binary_cross_entropy_with_logits",
+        "bincount",
+        "conv_tbc",
+        "copy",
+        "_copy_from",
+        "_copy_from_and_resize",
+        "count_nonzero",
+        "cudnn_affine_grid_generator",
+        "cudnn_affine_grid_generator_backward",
+        "cudnn_grid_sampler",
+        "diag_embed",
+        "embedding",
+        "embedding_dense_backward",
+        "_embedding_bag_dense_backward",
+        "_embedding_bag_per_sample_weights_backward",
+        "grid_sampler_2d",
+        "_grid_sampler_2d_cpu_fallback",
+        "grid_sampler_3d",
+        "isnan",
+        "mkldnn_linear",
+        "median",
+        "nanmedian",
+        "_sparse_sparse_matmul",
+        "batch_norm_backward_elemt",
+        "_euclidean_dist",
+        "pixel_shuffle",
+        "pixel_unshuffle",
+        "channel_shuffle",
+        "_reshape_nested_backward",
+        "relu",
+        "prelu",
+        "celu",
+        "slice_scatter",
+        "select_scatter",
+        "diagonal_scatter",
+        "sum",
+        "_mkldnn_transpose",
+        "_nested_tensor_from_mask",
+        "_nested_from_padded",
+        "_nested_tensor_size",
+        "_nested_from_padded_and_nested_example",
+        "_standard_gamma_grad",
+        "_dirichlet_grad",
+        "native_norm",
+        "_sparse_softmax",
+        "_sparse_softmax_backward_data",
+        "_sparse_log_softmax",
+        "_sparse_log_softmax_backward_data",
+        "zero",
+        "_sparse_addmm",
+        "sparse_mask",
+        "_sparse_mask_projection",
+        "_to_dense",
+        "_coalesce",
+        "_coalesced",
+        "copy_sparse_to_sparse",
+        "to_sparse",
+        "to_sparse_csr",
+        "to_sparse_csc",
+        "to_mkldnn",
+        "quantize_per_tensor_dynamic",
+        "quantize_per_channel",
+        "q_per_channel_scales",
+        "q_per_channel_zero_points",
+        "int_repr",
+        "_make_per_channel_quantized_tensor",
+        "set",
+        "lift",
+        "lift_fresh",
+        "lift_fresh_copy",
+        "masked_scatter",
+        "_masked_softmax",
+        "_masked_softmax_backward",
+        "put",
+        "index_reduce",
+        "trace",
+        "_cholesky_solve_helper",
+        "dist",
+        "max",
+        "_torch_cuda_cu_linker_symbol_op",
+        "glu_jvp",
+        "glu_backward_jvp",
+        "hardswish_backward",
+        "rrelu_with_noise_backward",
+        "mkldnn_adaptive_avg_pool2d_backward",
+        "_adaptive_avg_pool2d_backward",
+        "_adaptive_avg_pool3d_backward",
+        "isinf",
+        "linalg_lu_solve",
+        "linalg_vecdot",
+        "linalg_matrix_exp",
+        "linalg_eigvalsh",
+        "_test_warn_in_autograd",
+        "_test_autograd_multiple_dispatch_view",
+        "_test_autograd_multiple_dispatch_view_copy",
+        "_segment_reduce",
+        "_segment_reduce_backward",
+        "_fw_primal_copy",
+        "_make_dual_copy",
+        "view_as_real_copy",
+        "view_as_complex_copy",
+        "_conj_copy",
+        "_neg_view_copy",
+        "diagonal_copy",
+        "detach_copy",
+        "squeeze_copy",
+        "t_copy",
+        "unsqueeze_copy",
+        "_indices_copy",
+        "_values_copy",
+        "indices_copy",
+        "values_copy",
+        "crow_indices_copy",
+        "col_indices_copy",
+        "ccol_indices",
+        "ccol_indices_copy",
+        "row_indices",
+        "row_indices_copy",
+        "unfold_copy",
+        "alias_copy",
+        "_triton_multi_head_attention",
+        "special_airy_ai",
+        "special_bessel_j0",
+        "special_bessel_j1",
+        "special_bessel_y0",
+        "special_bessel_y1",
+        "special_chebyshev_polynomial_t",
+        "special_chebyshev_polynomial_u",
+        "special_chebyshev_polynomial_v",
+        "special_chebyshev_polynomial_w",
+        "special_hermite_polynomial_h",
+        "special_hermite_polynomial_he",
+        "special_laguerre_polynomial_l",
+        "special_legendre_polynomial_p",
+        "special_modified_bessel_i0",
+        "special_modified_bessel_i1",
+        "special_modified_bessel_k0",
+        "special_modified_bessel_k1",
+        "special_scaled_modified_bessel_k0",
+        "special_scaled_modified_bessel_k1",
+        "special_shifted_chebyshev_polynomial_t",
+        "special_shifted_chebyshev_polynomial_u",
+        "special_shifted_chebyshev_polynomial_v",
+        "special_shifted_chebyshev_polynomial_w",
+        "special_spherical_bessel_j0",
+        "_foobar",
+        "_nested_tensor_strides",
+        "_nested_tensor_storage_offsets",
+        "_nested_get_values",  # no CPU backend
+        "_nested_get_values_copy",  # no CPU backend
+        "_nested_view_from_jagged",  # testing needs to be patched
+        "_nested_view_from_jagged_copy",  # testing needs to be patched
+        "_nested_view_from_buffer",  # testing needs to be patched
+        "_nested_view_from_buffer_copy",  # testing needs to be patched
+        "_int_mm",  # testing needs to be patched
+        "_to_sparse_csc",  # testing needs to be patched
+        "_to_sparse_csr",  # testing needs to be patched
+        "segment_reduce",  # testing needs to be patched
+    )
+)
+
+
+def is_supported(g: NativeFunctionsGroup | NativeFunctionsViewGroup) -> bool:
+    base_op_name = ""
+    func = None
+    if isinstance(g, NativeFunctionsViewGroup):
+        base_op_name = g.view.root_name
+        func = g.view.func
+    else:
+        base_op_name = g.out.func.name.name.base
+        func = g.out.func
+    if config.is_hand_written(g):
+        logger.info("HAND WRITTEN: %s", base_op_name)
+        return False
+    if base_op_name in BLOCKED_OPS:
+        logger.info("BLOCKED: %s", base_op_name)
+        return False
+    for arg in func.schema_order_arguments():
+        maybe_method = ivalue_type_conversion_method(arg.type)
+        if not maybe_method:
+            # Type converting is unsupported yet.
+            logger.info("NOT SUPPORTED TYPE CONVERTING: %s", func)
+            return False
+
+    if isinstance(g, NativeFunctionsViewGroup):
+        # TODO: stop doing type tests by converting to C++ and then testing
+        # the string, just test the dang thing directly
+        if "at::Tensor" != cpp.returns_type(func.returns, symint=False).cpp_type():
+            # Returns a non-Tensor value.
+            logger.info("NON-TENSOR RET TYPE: %s", str(func))
+            return False
+        return True
+
+    # For out variant ops, we need to check the arguments of its functional func.
+    for arg in g.functional.func.schema_order_arguments():
+        maybe_method = ivalue_type_conversion_method(arg.type)
+        if not maybe_method:
+            # Type converting is unsupported yet.
+            logger.info("NOT SUPPORTED TYPE CONVERTING: %s", g.functional.func)
+            return False
+
+    if not g.structured:
+        # In case of unstructured op, we check if it has out variant implementation.
+        # The out variant implementation satisfies the minimum requirement that it has the output tensor as the last
+        # parameter.
+        if (
+            not hasattr(g, "out")
+            or not str(func).endswith("Tensor(a!) out) -> Tensor(a!)")
+            or not str(func.name).endswith(".out")
+        ):
+            return False
+    # TODO: stop type testing by converting to C++
+    if "at::Tensor &" != cpp.returns_type(func.returns, symint=False).cpp_type():
+        logger.info("NON_TENSOR RET TYPE: %s", func)
+        return False
+    if has_alias(func.arguments.non_out):
+        # This op may create an alias of inputs.
+        logger.info("INPUTS ALIAS: %s", base_op_name)
+        return False
+    return True
+
+
+def ivalue_type_conversion_method(
+    arg_type: BaseType | OptionalType | Type,
+) -> tuple[bool, str] | None:
+    """
+    Return the method call expression of `c10::ivalue' to convert its contained value to
+    the expected value of `arg_type` type. For example, for `arg_type` == BaseTy.Tensor,
+    this function returns ".toTensor()", so that it can be appended to the ivalue's
+    variable name to get the value of the expected type.
+    """
+    type_conversion_methods = {
+        BaseTy.Tensor: ((True, "toTensor()"), (False, "toOptional<at::Tensor>()")),
+        BaseTy.int: ((False, "toInt()"), (False, "toOptional<int64_t>()")),
+        BaseTy.bool: ((False, "toBool()"), (False, "toOptional<bool>()")),
+        BaseTy.Scalar: ((False, "toScalar()"), (False, "toOptional<at::Scalar>()")),
+        BaseTy.ScalarType: (
+            (False, "toScalarType()"),
+            (False, "toOptional<at::ScalarType>()"),
+        ),
+        BaseTy.str: (
+            (False, "toStringView()"),
+            (False, "toOptional<c10::string_view>()"),
+            (False, "toOptional<::std::string_view>()"),
+        ),
+    }
+
+    base_ty_object = None
+    if isinstance(arg_type, BaseType):
+        base_ty_object = arg_type.name
+    elif isinstance(arg_type, OptionalType):
+        if not isinstance(arg_type.elem, BaseType):
+            # ListType is currently unsupported.
+            return None
+        base_ty_object = arg_type.elem.name
+    else:
+        return None
+
+    if base_ty_object not in type_conversion_methods:
+        return None
+    methods = type_conversion_methods[base_ty_object]
+    if isinstance(arg_type, BaseType):
+        return methods[0]
+    return methods[1]
+
+
+should_use_int_tensor_ops_ = frozenset(
+    (
+        "bitwise_not",
+        "bitwise_and",
+        "bitwise_or",
+        "bitwise_xor",
+        "bitwise_left_shift",
+        "bitwise_right_shift",
+        "gcd",
+        "lcm",
+        "scatter",
+        "gather",
+        "_convert_indices_from_coo_to_csr",
+        "_convert_indices_from_csr_to_coo",
+    )
+)
+should_use_complex_tensor_ops_ = frozenset(("view_as_real", "imag", "_conj"))
+
+
+def should_use_int_tensor(op_name: str) -> bool:
+    return op_name in should_use_int_tensor_ops_
+
+
+def should_use_complex_tensor(op_name: str) -> bool:
+    return op_name in should_use_complex_tensor_ops_
+
+
+test_tensor_dim_ops_1_ = frozenset(
+    (
+        "addmv",
+        "index_add",
+        "_convert_indices_from_coo_to_csr",
+        "_convert_indices_from_csr_to_coo",
+        "nll_loss_backward",
+        "dot",
+        "vdot",
+        "outer",
+        "ger",
+    )
+)
+test_tensor_dim_ops_2_ = frozenset(
+    ("addmm", "mm", "nuclear_norm", "diag", "_addmm_activation", "matrix_H", "t")
+)
+
+
+def test_tensor_dim(op_name: str) -> int:
+    if op_name in test_tensor_dim_ops_1_:
+        return 1
+    if op_name in test_tensor_dim_ops_2_:
+        return 2
+    return 3
+
+
+test_tensor_shapes_string = '{"view_as_complex": "{2, 2}"}'
+test_tensor_shape_json: dict[str, str] = json.loads(test_tensor_shapes_string)
+
+
+def test_tensor_shape(op_name: str) -> str:
+    if op_name in test_tensor_shape_json:
+        return test_tensor_shape_json[op_name]
+    else:
+        return ""
+
+
+def test_value_expression(
+    arg_type: BaseType | OptionalType | Type, index: int, op_name: str
+) -> str:
+    tensor_size_ex = test_tensor_shape(op_name)
+    if tensor_size_ex == "":
+        num_tensors = 16 if index == 0 else 64
+        num_dim = test_tensor_dim(op_name)
+        size_per_dim = math.ceil(num_tensors / float(num_dim))
+        size_per_dim += size_per_dim % 2
+        tensor_size_ex = "{{{}}}".format(",".join([f"{size_per_dim}"] * num_dim))
+    if should_use_int_tensor(op_name):
+        tensor_expression = f"at::randint(1, 100, {tensor_size_ex}, at::kInt)"
+    elif should_use_complex_tensor(op_name):
+        tensor_expression = f"at::randn({tensor_size_ex}, at::kComplexFloat)"
+    else:
+        tensor_expression = f"at::rand({tensor_size_ex})"
+
+    value_expressions = {
+        BaseTy.Tensor: tensor_expression,
+        BaseTy.int: "1",
+        BaseTy.bool: "false",
+        BaseTy.Scalar: "2",
+        BaseTy.ScalarType: "at::ScalarType::Float",
+        BaseTy.str: '"floor"',
+    }
+
+    base_ty_object = None
+    if isinstance(arg_type, BaseType):
+        base_ty_object = arg_type.name
+    else:
+        assert isinstance(arg_type, OptionalType) and isinstance(
+            arg_type.elem, BaseType
+        )
+        base_ty_object = arg_type.elem.name
+    assert base_ty_object in value_expressions, "not expected type"
+    value_expression = value_expressions[base_ty_object]
+    return value_expression
+
+
+def generate_test_value_definitions(schema: FunctionSchema, index: int) -> str:
+    assert not schema.is_out_fn()
+    schema_name = schema.name.name.base
+    arg_map = {}
+    for arg in schema.schema_order_arguments():
+        test_value_exp = test_value_expression(arg.type, index, schema_name)
+        arg_map[arg.name] = test_value_exp
+    config.override_test_values(arg_map, schema_name, index)
+    arg_populations = []
+    for arg_name, arg_value in arg_map.items():
+        arg_populations.append(f"auto {arg_name}{index} = {arg_value}")
+    return ";\n    ".join(arg_populations) + ";"
+
+
+def generate_test_value_names(schema: FunctionSchema, index: int) -> str:
+    assert not schema.is_out_fn()
+    return ",".join(f"{arg.name}{index}" for arg in schema.schema_order_arguments())
+
+
+generate_test_ir_arguments_base_ty_to_type_str_ = {
+    BaseTy.Tensor: "Tensor",
+    BaseTy.int: "int",
+    BaseTy.float: "float",
+    BaseTy.str: "str",
+    BaseTy.Scalar: "int",
+    BaseTy.ScalarType: "int",
+    BaseTy.bool: "bool",
+}
+
+
+def generate_test_ir_arguments(
+    schema: FunctionSchema,
+) -> list[tuple[str, str | None]]:
+    def ir_argument(arg: Argument) -> tuple[str, str | None]:
+        t = arg.type
+        add_optional = False
+        if isinstance(t, OptionalType):
+            t = t.elem
+            add_optional = True
+        assert isinstance(t, BaseType)
+        type_str = None
+        if t.name in generate_test_ir_arguments_base_ty_to_type_str_:
+            type_str = generate_test_ir_arguments_base_ty_to_type_str_[t.name]
+        if type_str and add_optional:
+            type_str = f"{type_str}?"
+        return ("%" + arg.name, type_str)
+
+    return [ir_argument(arg) for arg in schema.schema_order_arguments()]
+
+
+def generate_arg_extraction(schema: FunctionSchema) -> str:
+    arg_populations = []
+    for i, arg in enumerate(schema.schema_order_arguments()):
+        maybe_method = ivalue_type_conversion_method(arg.type)
+        assert maybe_method
+        is_reference, type_conversion_method = maybe_method
+        reference = "&" if is_reference else ""
+        arg_populations.append(
+            f"const auto{reference} {arg.name} = p_node->Input({i}).{type_conversion_method}"
+        )
+    return ";\n    ".join(arg_populations) + ";"
+
+
+def get_kernel_name(g: NativeFunctionsGroup, backend_index: BackendIndex) -> str:
+    kernel = backend_index.get_kernel(g.functional)
+    if g.structured or kernel is None:
+        return cpp.name(g.functional.func)
+    return kernel.kernel
+
+
+def get_out_kernel_name(g: NativeFunctionsGroup, backend_index: BackendIndex) -> str:
+    kernel = backend_index.get_kernel(g.out)
+    if g.structured or kernel is None:
+        return cpp.name(g.out.func)
+    return kernel.kernel
+
+
+def generate_non_out_variant_call(
+    g: NativeFunctionsGroup, backend_index: BackendIndex
+) -> str:
+    schema = g.functional.func
+    assert not schema.is_out_fn()
+    kernel_name = get_kernel_name(g, backend_index)
+    arg_names = (arg.name for arg in schema.schema_order_arguments())
+    namespace_name = "cpu" if g.structured else "native"
+    return f"at::{namespace_name}::{kernel_name}({','.join(arg_names)})"
+
+
+def generate_call_to_view_ops(
+    g: NativeFunctionsViewGroup, backend_index: BackendIndex
+) -> str:
+    schema = g.view.func
+    kernel_name = cpp.name(schema)
+    kernel = backend_index.get_kernel(g.view)
+    if kernel:
+        kernel_name = kernel.kernel
+    arg_names = (arg.name for arg in schema.schema_order_arguments())
+    namespace_name = "native"
+    return f"at::{namespace_name}::{kernel_name}({','.join(arg_names)})"
+
+
+def generate_out_variant_call(
+    g: NativeFunctionsGroup, backend_index: BackendIndex
+) -> str:
+    schema = g.out.func
+    assert schema.is_out_fn()
+    arg_names = []
+    kernel_name = get_out_kernel_name(g, backend_index)
+    if g.structured:
+        # structured op starts with the output tensor argument.
+        arg_names = [out_arg.name for out_arg in schema.arguments.out]
+    else:
+        arg_names = []
+    for arg in schema.arguments.non_out:
+        if isinstance(arg, SelfArgument):
+            arg_names.append(arg.argument.name)
+        else:
+            assert isinstance(arg, Argument)
+            arg_names.append(arg.name)
+    if not g.structured:
+        assert len(schema.arguments.out) == 1
+        arg_names.append(schema.arguments.out[0].name)
+    cpp_arg_names = ",".join(arg_names)
+    namespace_name = "cpu" if g.structured else "native"
+    return f"at::{namespace_name}::{kernel_name}({cpp_arg_names})"
+
+
+no_memory_resize_ops = frozenset(
+    (
+        "isin.Scalar_Tensor",
+        "index_add",
+        "dot",
+        "vdot",
+        "nuclear_norm",
+        "histc",
+        "l1_loss",
+        "multi_margin_loss",
+        "multilabel_margin_loss",
+        "nll_loss",
+        "nll_loss2d",
+        "prod",
+    )
+)
+
+
+def should_check_resize(schema: FunctionSchema) -> bool:
+    schema_str = str(schema)
+    type_variant_op_name = schema_str[: schema_str.find("(")]
+    return type_variant_op_name not in no_memory_resize_ops
+
+
+def op_name_from_group(g: NativeFunctionsGroup) -> str:
+    return g.functional.func.name.name.base
+
+
+class GenOpDispatcher:
+    def out_variant(
+        self, groups: Sequence[NativeFunctionsGroup], backend_index: BackendIndex
+    ) -> str:
+        if not groups:
+            return ""
+        generated_type_variants = []
+        for g in groups:
+            with native_function_manager(g):
+                assert is_supported(g)
+                assert isinstance(g, NativeFunctionsGroup)
+                generated_type_variant = self.out_variant_op_generator(g, backend_index)
+                generated_type_variants.append(generated_type_variant)
+        op_name = op_name_from_group(groups[0])
+        body = "\n".join(generated_type_variants)
+        generated = f"""
+REGISTER_OPERATOR_FUNCTOR(
+    aten::{op_name},
+    aten_{op_name},
+    [](Node* n) -> SROperator {{
+      {body}
+      LogAndDumpSchema(n);
+      return nullptr;
+    }})
+"""
+        return generated
+
+    def view(
+        self, groups: Sequence[NativeFunctionsViewGroup], backend_index: BackendIndex
+    ) -> str:
+        if not groups:
+            return ""
+        generated_type_variants = []
+        for g in groups:
+            with native_function_manager(g):
+                assert is_supported(g)
+                assert isinstance(g, NativeFunctionsViewGroup)
+                generated_type_variant = self.view_op_generator(g, backend_index)
+                generated_type_variants.append(generated_type_variant)
+        op_name = config.func_name_base_str(groups[0])
+        body = "\n".join(generated_type_variants)
+        generated = f"""
+REGISTER_NATIVE_OPERATOR_FUNCTOR(
+    aten::{op_name},
+    aten_{op_name},
+    [](Node* n) -> SROperator {{
+      {body}
+      LogAndDumpSchema(n);
+      return nullptr;
+    }});
+"""
+        return generated
+
+    def out_variant_op_generator(
+        self, g: NativeFunctionsGroup, backend_index: BackendIndex
+    ) -> str:
+        functional = g.functional
+        schema = str(functional.func)
+        populated_argument = generate_arg_extraction(g.functional.func)
+        functional_variant_call = generate_non_out_variant_call(g, backend_index)
+        assert len(g.out.func.arguments.out) == 1
+        out_variable_name = str(g.out.func.arguments.out[0].name)
+        out_variant_call = generate_out_variant_call(g, backend_index)
+        generated = f"""
+      if (n->matches(torch::schema("aten::{schema}"))) {{
+        return [](ProcessedNode* p_node) {{
+          {populated_argument}
+          if (p_node->Output(0).isNone()) {{
+            p_node->Output(0) = {functional_variant_call};
+            return;
+          }}
+          auto& {out_variable_name} = p_node->Output(0).toTensor();
+          fastResizeToZero({out_variable_name});
+          {out_variant_call};
+        }};
+      }}"""
+        return generated
+
+    def view_op_generator(
+        self, g: NativeFunctionsViewGroup, backend_index: BackendIndex
+    ) -> str:
+        schema = str(g.view.func)
+        populated_argument = generate_arg_extraction(g.view.func)
+        functional_variant_call = generate_call_to_view_ops(g, backend_index)
+        generated = f"""
+      if (n->matches(torch::schema("aten::{schema}"))) {{
+        return [](ProcessedNode* p_node) {{
+          {populated_argument}
+            p_node->Output(0) = {functional_variant_call};
+        }};
+      }}"""
+        return generated
+
+
+class GenOpTestCase:
+    def out_variant(self, groups: Sequence[NativeFunctionsGroup]) -> str:
+        if not groups:
+            return ""
+        generated_type_variants = []
+        for g in groups:
+            with native_function_manager(g):
+                assert is_supported(g)
+                assert isinstance(g, NativeFunctionsGroup)
+                generated_type_variant = self.out_variant_op_test_case_generator(g)
+                generated_type_variants.append(generated_type_variant)
+        return "\n".join(generated_type_variants)
+
+    def view(self, groups: Sequence[NativeFunctionsViewGroup]) -> str:
+        if not groups:
+            return ""
+        generated_type_variants = []
+        for g in groups:
+            with native_function_manager(g):
+                assert is_supported(g)
+                assert isinstance(g, NativeFunctionsViewGroup)
+                generated_type_variant = self.view_op_test_case_generator(g)
+                generated_type_variants.append(generated_type_variant)
+        return "\n".join(generated_type_variants)
+
+    def out_variant_op_test_case_generator(self, g: NativeFunctionsGroup) -> str:
+        schema = g.functional.func
+        schema_str = str(schema)
+        assert schema_str.find("(") > 0
+        type_variant_op_name = schema_str[: schema_str.find("(")].replace(".", "_")
+        op_name = op_name_from_group(g)
+        assert type_variant_op_name.startswith(op_name)
+
+        arg_types = generate_test_ir_arguments(schema)
+        arg_declarations = ", ".join(
+            (
+                arg_name if arg_type is None else f"{arg_name}: {arg_type}"
+                for arg_name, arg_type in arg_types
+            )
+        )
+        arg_names = ", ".join((arg_name for arg_name, _ in arg_types))
+        assert (
+            len(schema.returns) == 1
+            and isinstance(schema.returns[0].type, BaseType)
+            and schema.returns[0].type.name is BaseTy.Tensor
+        )
+        test_value_definitions = generate_test_value_definitions(schema, 0)
+        test_value_names = generate_test_value_names(schema, 0)
+        test_value_definitions2 = generate_test_value_definitions(schema, 1)
+        test_value_names2 = generate_test_value_names(schema, 1)
+        check_resize = "true" if should_check_resize(schema) else "false"
+        generated = f"""
+TEST(StaticRuntime, autogen_{type_variant_op_name}) {{
+  const std::string script = R"IR(
+    graph({arg_declarations}):
+        %bias: None = prim::Constant()
+        %ret = aten::{op_name}({arg_names})
+        %cloned = aten::clone(%ret, %bias)
+        return (%cloned)
+  )IR";
+
+  {test_value_definitions}
+  std::vector<IValue> args{{{test_value_names}}};
+  testStaticRuntime(script, args, {{}}, /*use_allclose=*/false, /*use_equalnan=*/false, /*check_resize=*/{check_resize});
+
+  {test_value_definitions2}
+  std::vector<IValue> args2{{{test_value_names2}}};
+  testStaticRuntime(script, args, args2, /*use_allclose=*/false, /*use_equalnan=*/false, /*check_resize=*/{check_resize});
+
+}}
+"""
+        return generated
+
+    def view_op_test_case_generator(self, g: NativeFunctionsViewGroup) -> str:
+        schema = g.view.func
+        schema_str = str(schema)
+        assert schema_str.find("(") > 0
+        type_variant_op_name = schema_str[: schema_str.find("(")].replace(".", "_")
+        op_name = g.view.root_name
+        assert type_variant_op_name.startswith(op_name)
+
+        arg_types = generate_test_ir_arguments(schema)
+        arg_declarations = ", ".join(
+            (
+                arg_name if arg_type is None else f"{arg_name}: {arg_type}"
+                for arg_name, arg_type in arg_types
+            )
+        )
+        arg_names = ", ".join((arg_name for arg_name, _ in arg_types))
+        assert (
+            len(schema.returns) == 1
+            and isinstance(schema.returns[0].type, BaseType)
+            and schema.returns[0].type.name is BaseTy.Tensor
+        )
+        test_value_definitions = generate_test_value_definitions(schema, 0)
+        test_value_names = generate_test_value_names(schema, 0)
+        generated = f"""
+TEST(StaticRuntime, autogen_{type_variant_op_name}) {{
+  const std::string script = R"IR(
+    graph({arg_declarations}):
+        %bias: None = prim::Constant()
+        %ret = aten::{op_name}({arg_names})
+        %cloned = aten::clone(%ret, %bias)
+        return (%cloned)
+  )IR";
+
+  {test_value_definitions}
+  std::vector<IValue> args{{{test_value_names}}};
+  testStaticRuntime(script, args);
+}}
+"""
+
+        return generated
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..29063d57c693dfbcd985c96f1b67a1eb076a0e86
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/__pycache__/glob.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/__pycache__/glob.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..7db0daaba9160419069ad54d9b3eb47c387990c8
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/__pycache__/glob.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/__init__.py b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/__pycache__/__init__.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/__pycache__/__init__.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..8e42b3112068d26e08e7e75aa427c074d0ce589a
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/__pycache__/__init__.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/__pycache__/py310.cpython-312.pyc b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/__pycache__/py310.cpython-312.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..1b119d1b2fce8e1333fed1437429c3d00a1c1583
Binary files /dev/null and b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/__pycache__/py310.cpython-312.pyc differ
diff --git a/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/py310.py b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/py310.py
new file mode 100644
index 0000000000000000000000000000000000000000..d5ca53e037b267ff1b196b3b208dea0acdebe4c5
--- /dev/null
+++ b/URSA/.venv_ursa/lib/python3.12/site-packages/zipp/compat/py310.py
@@ -0,0 +1,11 @@
+import sys
+import io
+
+
+def _text_encoding(encoding, stacklevel=2, /):  # pragma: no cover
+    return encoding
+
+
+text_encoding = (
+    io.text_encoding if sys.version_info > (3, 10) else _text_encoding  # type: ignore
+)
diff --git a/URSA/.venv_ursa/share/man/man1/isympy.1 b/URSA/.venv_ursa/share/man/man1/isympy.1
new file mode 100644
index 0000000000000000000000000000000000000000..0ff966158a28c5ad1a6cd954e454842b25fdd999
--- /dev/null
+++ b/URSA/.venv_ursa/share/man/man1/isympy.1
@@ -0,0 +1,188 @@
+'\" -*- coding: us-ascii -*-
+.if \n(.g .ds T< \\FC
+.if \n(.g .ds T> \\F[\n[.fam]]
+.de URL
+\\$2 \(la\\$1\(ra\\$3
+..
+.if \n(.g .mso www.tmac
+.TH isympy 1 2007-10-8 "" ""
+.SH NAME
+isympy \- interactive shell for SymPy
+.SH SYNOPSIS
+'nh
+.fi
+.ad l
+\fBisympy\fR \kx
+.if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
+'in \n(.iu+\nxu
+[\fB-c\fR | \fB--console\fR] [\fB-p\fR ENCODING | \fB--pretty\fR ENCODING] [\fB-t\fR TYPE | \fB--types\fR TYPE] [\fB-o\fR ORDER | \fB--order\fR ORDER] [\fB-q\fR | \fB--quiet\fR] [\fB-d\fR | \fB--doctest\fR] [\fB-C\fR | \fB--no-cache\fR] [\fB-a\fR | \fB--auto\fR] [\fB-D\fR | \fB--debug\fR] [
+-- | PYTHONOPTIONS]
+'in \n(.iu-\nxu
+.ad b
+'hy
+'nh
+.fi
+.ad l
+\fBisympy\fR \kx
+.if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
+'in \n(.iu+\nxu
+[
+{\fB-h\fR | \fB--help\fR}
+|
+{\fB-v\fR | \fB--version\fR}
+]
+'in \n(.iu-\nxu
+.ad b
+'hy
+.SH DESCRIPTION
+isympy is a Python shell for SymPy. It is just a normal python shell
+(ipython shell if you have the ipython package installed) that executes
+the following commands so that you don't have to:
+.PP
+.nf
+\*(T<
+>>> from __future__ import division
+>>> from sympy import *
+>>> x, y, z = symbols("x,y,z")
+>>> k, m, n = symbols("k,m,n", integer=True)
+    \*(T>
+.fi
+.PP
+So starting isympy is equivalent to starting python (or ipython) and
+executing the above commands by hand. It is intended for easy and quick
+experimentation with SymPy. For more complicated programs, it is recommended
+to write a script and import things explicitly (using the "from sympy
+import sin, log, Symbol, ..." idiom).
+.SH OPTIONS
+.TP
+\*(T<\fB\-c \fR\*(T>\fISHELL\fR, \*(T<\fB\-\-console=\fR\*(T>\fISHELL\fR
+Use the specified shell (python or ipython) as
+console backend instead of the default one (ipython
+if present or python otherwise).
+
+Example: isympy -c python
+
+\fISHELL\fR could be either
+\&'ipython' or 'python'
+.TP
+\*(T<\fB\-p \fR\*(T>\fIENCODING\fR, \*(T<\fB\-\-pretty=\fR\*(T>\fIENCODING\fR
+Setup pretty printing in SymPy. By default, the most pretty, unicode
+printing is enabled (if the terminal supports it). You can use less
+pretty ASCII printing instead or no pretty printing at all.
+
+Example: isympy -p no
+
+\fIENCODING\fR must be one of 'unicode',
+\&'ascii' or 'no'.
+.TP
+\*(T<\fB\-t \fR\*(T>\fITYPE\fR, \*(T<\fB\-\-types=\fR\*(T>\fITYPE\fR
+Setup the ground types for the polys. By default, gmpy ground types
+are used if gmpy2 or gmpy is installed, otherwise it falls back to python
+ground types, which are a little bit slower. You can manually
+choose python ground types even if gmpy is installed (e.g., for testing purposes).
+
+Note that sympy ground types are not supported, and should be used
+only for experimental purposes.
+
+Note that the gmpy1 ground type is primarily intended for testing; it the
+use of gmpy even if gmpy2 is available.
+
+This is the same as setting the environment variable
+SYMPY_GROUND_TYPES to the given ground type (e.g.,
+SYMPY_GROUND_TYPES='gmpy')
+
+The ground types can be determined interactively from the variable
+sympy.polys.domains.GROUND_TYPES inside the isympy shell itself.
+
+Example: isympy -t python
+
+\fITYPE\fR must be one of 'gmpy',
+\&'gmpy1' or 'python'.
+.TP
+\*(T<\fB\-o \fR\*(T>\fIORDER\fR, \*(T<\fB\-\-order=\fR\*(T>\fIORDER\fR
+Setup the ordering of terms for printing. The default is lex, which
+orders terms lexicographically (e.g., x**2 + x + 1). You can choose
+other orderings, such as rev-lex, which will use reverse
+lexicographic ordering (e.g., 1 + x + x**2).
+
+Note that for very large expressions, ORDER='none' may speed up
+printing considerably, with the tradeoff that the order of the terms
+in the printed expression will have no canonical order
+
+Example: isympy -o rev-lax
+
+\fIORDER\fR must be one of 'lex', 'rev-lex', 'grlex',
+\&'rev-grlex', 'grevlex', 'rev-grevlex', 'old', or 'none'.
+.TP
+\*(T<\fB\-q\fR\*(T>, \*(T<\fB\-\-quiet\fR\*(T>
+Print only Python's and SymPy's versions to stdout at startup, and nothing else.
+.TP
+\*(T<\fB\-d\fR\*(T>, \*(T<\fB\-\-doctest\fR\*(T>
+Use the same format that should be used for doctests. This is
+equivalent to '\fIisympy -c python -p no\fR'.
+.TP
+\*(T<\fB\-C\fR\*(T>, \*(T<\fB\-\-no\-cache\fR\*(T>
+Disable the caching mechanism. Disabling the cache may slow certain
+operations down considerably. This is useful for testing the cache,
+or for benchmarking, as the cache can result in deceptive benchmark timings.
+
+This is the same as setting the environment variable SYMPY_USE_CACHE
+to 'no'.
+.TP
+\*(T<\fB\-a\fR\*(T>, \*(T<\fB\-\-auto\fR\*(T>
+Automatically create missing symbols. Normally, typing a name of a
+Symbol that has not been instantiated first would raise NameError,
+but with this option enabled, any undefined name will be
+automatically created as a Symbol. This only works in IPython 0.11.
+
+Note that this is intended only for interactive, calculator style
+usage. In a script that uses SymPy, Symbols should be instantiated
+at the top, so that it's clear what they are.
+
+This will not override any names that are already defined, which
+includes the single character letters represented by the mnemonic
+QCOSINE (see the "Gotchas and Pitfalls" document in the
+documentation). You can delete existing names by executing "del
+name" in the shell itself. You can see if a name is defined by typing
+"'name' in globals()".
+
+The Symbols that are created using this have default assumptions.
+If you want to place assumptions on symbols, you should create them
+using symbols() or var().
+
+Finally, this only works in the top level namespace. So, for
+example, if you define a function in isympy with an undefined
+Symbol, it will not work.
+.TP
+\*(T<\fB\-D\fR\*(T>, \*(T<\fB\-\-debug\fR\*(T>
+Enable debugging output. This is the same as setting the
+environment variable SYMPY_DEBUG to 'True'. The debug status is set
+in the variable SYMPY_DEBUG within isympy.
+.TP
+-- \fIPYTHONOPTIONS\fR
+These options will be passed on to \fIipython (1)\fR shell.
+Only supported when ipython is being used (standard python shell not supported).
+
+Two dashes (--) are required to separate \fIPYTHONOPTIONS\fR
+from the other isympy options.
+
+For example, to run iSymPy without startup banner and colors:
+
+isympy -q -c ipython -- --colors=NoColor
+.TP
+\*(T<\fB\-h\fR\*(T>, \*(T<\fB\-\-help\fR\*(T>
+Print help output and exit.
+.TP
+\*(T<\fB\-v\fR\*(T>, \*(T<\fB\-\-version\fR\*(T>
+Print isympy version information and exit.
+.SH FILES
+.TP
+\*(T<\fI${HOME}/.sympy\-history\fR\*(T>
+Saves the history of commands when using the python
+shell as backend.
+.SH BUGS
+The upstreams BTS can be found at \(lahttps://github.com/sympy/sympy/issues\(ra
+Please report all bugs that you find in there, this will help improve
+the overall quality of SymPy.
+.SH "SEE ALSO"
+\fBipython\fR(1), \fBpython\fR(1)
diff --git a/URSA/diffnext/data/__init__.py b/URSA/diffnext/data/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..90be158a6ac71e82598484e5d0d0be3efe593c25
--- /dev/null
+++ b/URSA/diffnext/data/__init__.py
@@ -0,0 +1,16 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Data components."""
diff --git a/URSA/diffnext/data/flex_loaders.py b/URSA/diffnext/data/flex_loaders.py
new file mode 100644
index 0000000000000000000000000000000000000000..5a263adb7e6acfbe4a3d4a7ddfe56fa54093075f
--- /dev/null
+++ b/URSA/diffnext/data/flex_loaders.py
@@ -0,0 +1,172 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Flex data loaders."""
+
+import collections
+import multiprocessing as mp
+import time
+import threading
+import queue
+
+import codewithgpu
+import numpy as np
+
+from diffnext.data.flex_pipelines import FeatureWorker
+
+
+class BalancedQueues(object):
+    """Balanced queues."""
+
+    def __init__(self, base_queue, num=1):
+        self.queues = [base_queue]
+        self.queues += [mp.Queue(base_queue._maxsize) for _ in range(num - 1)]
+        self.index = 0
+
+    def put(self, obj, block=True, timeout=None):
+        q = self.queues[self.index]
+        q.put(obj, block=block, timeout=timeout)
+        self.index = (self.index + 1) % len(self.queues)
+
+    def get(self, block=True, timeout=None):
+        q = self.queues[self.index]
+        obj = q.get(block=block, timeout=timeout)
+        self.index = (self.index + 1) % len(self.queues)
+        return obj
+
+    def get_n(self, num=1):
+        outputs = []
+        while len(outputs) < num:
+            obj = self.get()
+            if obj is not None:
+                outputs.append(obj)
+        return outputs
+
+
+class DataLoaderBase(threading.Thread):
+    """Base class of data loader."""
+
+    def __init__(self, worker, **kwargs):
+        super().__init__(daemon=True)
+        self.seed = kwargs.pop("seed", 1337)
+        self.shuffle = kwargs.pop("shuffle", True)
+        self.shard_id = kwargs.get("shard_id", 0)
+        self.num_shards = kwargs.get("num_shards", 1)
+        self.batch_size = kwargs.get("batch_size", 1)
+        self.num_workers = kwargs.get("num_workers", 1)
+        self.queue_depth = kwargs.get("queue_depth", 2)
+        # Build queues.
+        self.reader_queue = mp.Queue(self.queue_depth * self.batch_size)
+        self.worker_queue = mp.Queue(self.queue_depth * self.batch_size)
+        self.batch_queue = queue.Queue(self.queue_depth)
+        self.reader_queue = BalancedQueues(self.reader_queue, self.num_workers)
+        self.worker_queue = BalancedQueues(self.worker_queue, self.num_workers)
+        # Build readers.
+        self.readers = [
+            codewithgpu.DatasetReader(
+                output_queue=self.reader_queue,
+                partition_id=self.shard_id,
+                num_partitions=self.num_shards,
+                seed=self.seed + self.shard_id,
+                shuffle=self.shuffle,
+                **kwargs,
+            )
+        ]
+        self.readers[0].start()
+        time.sleep(0.1)
+        # Build workers.
+        self.workers = []
+        for i in range(self.num_workers):
+            p = worker()
+            p.seed = self.seed + i + self.shard_id * self.num_workers
+            p.reader_queue = self.reader_queue.queues[i]
+            p.worker_queue = self.worker_queue.queues[i]
+            p.start()
+            self.workers.append(p)
+            time.sleep(0.1)
+
+        # Register cleanup callbacks.
+        def cleanup():
+            def terminate(processes):
+                for p in processes:
+                    p.terminate()
+                    p.join()
+
+            terminate(self.workers)
+            terminate(self.readers)
+
+        import atexit
+
+        atexit.register(cleanup)
+        # Start batch prefetching.
+        self.start()
+
+    def next(self):
+        """Return the next batch of data."""
+        return self.__next__()
+
+    def run(self):
+        """Main loop."""
+
+    def __call__(self):
+        return self.next()
+
+    def __iter__(self):
+        """Return the iterator self."""
+        return self
+
+    def __next__(self):
+        """Return the next batch of data."""
+        return [self.batch_queue.get()]
+
+
+class DataLoader(DataLoaderBase):
+    """Loader to return the batch of data."""
+
+    def __init__(self, dataset, worker, **kwargs):
+        kwargs.update({"path": dataset})  # Alias for codewithgpu.
+        self.contiguous = kwargs.pop("contiguous", True)
+        self.prefetch_count = kwargs.pop("prefetch_count", 50)
+        super().__init__(worker, **kwargs)
+
+    def run(self):
+        """Main loop."""
+        prev_inputs = self.worker_queue.get_n(self.prefetch_count * self.batch_size)
+        next_inputs = []
+        while True:
+            # Use cached buffer for next N inputs.
+            if len(next_inputs) == 0:
+                next_inputs = prev_inputs
+                prev_inputs = []
+            # Collect the next batch.
+            outputs = collections.defaultdict(list)
+            for _ in range(self.batch_size):
+                inputs = next_inputs.pop(0)
+                for k, v in inputs.items():
+                    outputs[k].extend(v)
+                prev_inputs += self.worker_queue.get_n(1)
+            # Stack batch data.
+            if self.contiguous:
+                if "latents" in outputs:
+                    outputs["latents"] = np.stack(outputs["latents"])
+            # Send batch data to consumer.
+            self.batch_queue.put(outputs)
+
+
+class FeatureDataLoader(DataLoader):
+    """Loader to return the batch of data features."""
+
+    def __init__(self, dataset, **kwargs):
+        super().__init__(dataset, FeatureWorker, **kwargs)
diff --git a/URSA/diffnext/data/flex_pipelines.py b/URSA/diffnext/data/flex_pipelines.py
new file mode 100644
index 0000000000000000000000000000000000000000..af52db37e8c3113281619922d9d02e5ed57d96a7
--- /dev/null
+++ b/URSA/diffnext/data/flex_pipelines.py
@@ -0,0 +1,63 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Flex data pipelines."""
+
+import multiprocessing
+
+import cv2
+import numpy.random as npr
+
+from diffnext.data import flex_transforms
+
+
+class Worker(multiprocessing.Process):
+    """Base data worker."""
+
+    def __init__(self):
+        super().__init__(daemon=True)
+        self.seed = 1337
+        self.reader_queue = None
+        self.worker_queue = None
+
+    def run(self):
+        """Run implementation."""
+        # Disable opencv threading and fix numpy random seed.
+        cv2.setNumThreads(1), npr.seed(self.seed)
+        while True:  # Main loop.
+            self.worker_queue.put(self.get_outputs(self.reader_queue.get()))
+
+
+class FeaturePipe(object):
+    """Pipeline to transform data features."""
+
+    def __init__(self):
+        super().__init__()
+        self.parse_latents = flex_transforms.ParseLatents()
+        self.parse_annotations = flex_transforms.ParseAnnotations()
+
+    def get_outputs(self, inputs):
+        """Return the outputs."""
+        latents = self.parse_latents(inputs)
+        label, caption = self.parse_annotations(inputs)
+        outputs = {"latents": [latents]}
+        outputs.setdefault("prompt", [label]) if label is not None else None
+        outputs.setdefault("prompt", [caption]) if caption is not None else None
+        outputs.setdefault("motion", [inputs["flow"]]) if "flow" in inputs else None
+        return outputs
+
+
+class FeatureWorker(FeaturePipe, Worker):
+    """Worker to transform data features."""
diff --git a/URSA/diffnext/data/flex_transforms.py b/URSA/diffnext/data/flex_transforms.py
new file mode 100644
index 0000000000000000000000000000000000000000..a06f4b409cf04680c5faf1b6dc49975171e00a2a
--- /dev/null
+++ b/URSA/diffnext/data/flex_transforms.py
@@ -0,0 +1,66 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Flex data transforms."""
+
+import re
+import numpy as np
+import numpy.random as npr
+
+
+class Transform(object):
+    """Base transform type."""
+
+    def filter_outputs(self, *outputs):
+        outputs = [x for x in outputs if x is not None]
+        return outputs if len(outputs) > 1 else outputs[0]
+
+
+class ParseLatents(Transform):
+    """Parse VQ or VAE latents."""
+
+    def __init__(self):
+        super().__init__()
+
+    def __call__(self, inputs):
+        for k, dtype in zip(("moments", "codes"), ("float16", "int32")):
+            if k in inputs:
+                return np.frombuffer(inputs[k], dtype).reshape(inputs["shape"])
+        raise ValueError("Missing latents in inputs.")
+
+
+class ParseAnnotations(Transform):
+    """Parse ground-truth annotations."""
+
+    def __init__(self, short_prob=0.5):
+        super().__init__()
+        self.short_prob = short_prob
+
+    def __call__(self, inputs):
+        text = inputs.get("text", None)
+        label = inputs.get("label", None)
+        caption = inputs.get("caption", None)
+        if caption and isinstance(caption, dict):  # Cached.
+            caption = np.frombuffer(caption["data"], "float16").reshape(caption["shape"])
+            if text and isinstance(text, dict) and len(text["data"]) > 0 and npr.rand() < 0.5:
+                caption = np.frombuffer(text["data"], "float16").reshape(text["shape"])
+            return label, caption
+
+        # Improved short caption.
+        if label is None:
+            text_match = re.match(r"^(.*?[.!?])\s+", caption)
+            text = text if text else (text_match.group(1) if text_match else caption)
+            caption = text if text and npr.rand() < self.short_prob else caption
+        return label, caption
diff --git a/URSA/diffnext/engine/__init__.py b/URSA/diffnext/engine/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..c40b63827cd0e1958c1bf954d4e6e9a60564e933
--- /dev/null
+++ b/URSA/diffnext/engine/__init__.py
@@ -0,0 +1,16 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, esither express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Engine components."""
diff --git a/URSA/diffnext/engine/engine_utils.py b/URSA/diffnext/engine/engine_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..c77b2002a22c619e57089d261c04009660e64ba2
--- /dev/null
+++ b/URSA/diffnext/engine/engine_utils.py
@@ -0,0 +1,109 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, esither express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Engine utilities."""
+
+import collections
+import pickle
+
+import numpy as np
+import torch
+from torch import nn
+
+
+def count_params(module, trainable=True, unit="M"):
+    """Return the number of parameters."""
+    counts = [v.size().numel() for v in module.parameters() if v.requires_grad or (not trainable)]
+    return sum(counts) / {"M": 1e6, "B": 1e9}[unit]
+
+
+def freeze_module(module, trainable=False):
+    """Freeze parameters of given module."""
+    module.eval() if not trainable else module.train()
+    for param in module.parameters():
+        param.requires_grad = trainable
+    return module
+
+
+def get_device(index):
+    """Create the available device object."""
+    if torch.cuda.is_available():
+        return torch.device("cuda", index)
+    for device_type in ("mps",):
+        try:
+            if getattr(torch.backends, device_type).is_available():
+                return torch.device(device_type, index)
+        except AttributeError:
+            pass
+    return torch.device("cpu")
+
+
+def get_param_groups(model):
+    """Separate parameters into groups."""
+    memo, groups, lr_scale_getter = set(), collections.OrderedDict(), None
+    norm_types = (nn.BatchNorm2d, nn.GroupNorm, nn.SyncBatchNorm, nn.LayerNorm)
+    for module_name, module in model.named_modules():
+        for param_name, param in module.named_parameters(recurse=False):
+            if not param.requires_grad or param in memo:
+                continue
+            memo.add(param)
+            attrs = collections.OrderedDict()
+            if lr_scale_getter:
+                attrs["lr_scale"] = lr_scale_getter(f"{module_name}.{param_name}")
+            if hasattr(param, "lr_scale"):
+                attrs["lr_scale"] = param.lr_scale
+            if getattr(param, "no_weight_decay", False) or isinstance(module, norm_types):
+                attrs["weight_decay"] = 0
+            group_name = "/".join(["%s:%s" % (v[0], v[1]) for v in list(attrs.items())])
+            groups[group_name] = groups.get(group_name, {**attrs, **{"params": []}})
+            groups[group_name]["params"].append(param)
+    return list(groups.values())
+
+
+def load_weights(module, weights_file, prefix_removed="", strict=True):
+    """Load a weights file."""
+    if not weights_file:
+        return
+    if weights_file.endswith(".pkl"):
+        with open(weights_file, "rb") as f:
+            state_dict = pickle.load(f)
+            for k, v in state_dict.items():
+                state_dict[k] = torch.as_tensor(v)
+    else:
+        state_dict = torch.load(weights_file, map_location="cpu", weights_only=False)
+    if prefix_removed:
+        new_state_dict = type(state_dict)()
+        for k in list(state_dict.keys()):
+            if k.startswith(prefix_removed):
+                new_state_dict[k.replace(prefix_removed, "")] = state_dict.pop(k)
+        state_dict = new_state_dict
+    module.load_state_dict(state_dict, strict=strict)
+
+
+def manual_seed(seed, device_and_seed=None):
+    """Set the cpu and device random seed."""
+    torch.manual_seed(seed)
+    if device_and_seed is not None:
+        device_index, device_seed = device_and_seed
+        device_type = get_device(device_index).type
+        np.random.seed(device_seed)
+        if device_type in ("cuda", "mps"):
+            getattr(torch, device_type).manual_seed(device_seed)
+
+
+def synchronize_device(device):
+    """Synchronize the computation of device."""
+    if device.type in ("cuda", "mps"):
+        getattr(torch, device.type).synchronize(device)
diff --git a/URSA/diffnext/engine/lr_scheduler.py b/URSA/diffnext/engine/lr_scheduler.py
new file mode 100644
index 0000000000000000000000000000000000000000..4b11cdff6a2c26aa36c42067069c7276c68fd487
--- /dev/null
+++ b/URSA/diffnext/engine/lr_scheduler.py
@@ -0,0 +1,76 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, esither express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Learning rate schedulers."""
+
+import math
+
+
+class ConstantLR(object):
+    """Constant LR scheduler."""
+
+    def __init__(self, **kwargs):
+        self._lr_max = kwargs.pop("lr_max")
+        self._lr_min = kwargs.pop("lr_min", 0)
+        self._warmup_steps = kwargs.pop("warmup_steps", 0)
+        self._warmup_factor = kwargs.pop("warmup_factor", 0.001)
+        self._step_count, self._last_decay = 0, 1.0
+
+    def step(self):
+        self._step_count += 1
+
+    def get_lr(self):
+        if self._step_count < self._warmup_steps:
+            alpha = (self._step_count + 1.0) / self._warmup_steps
+            return self._lr_max * (alpha + (1.0 - alpha) * self._warmup_factor)
+        return self._lr_min + (self._lr_max - self._lr_min) * self.get_decay()
+
+    def get_decay(self):
+        return self._last_decay
+
+
+class CosineLR(ConstantLR):
+    """LR scheduler with cosine decay."""
+
+    def __init__(self, lr_max, max_steps, lr_min=0, decay_step=1, **kwargs):
+        super().__init__(lr_max=lr_max, lr_min=lr_min, **kwargs)
+        self._decay_step, self._max_steps = decay_step, max_steps
+
+    def get_decay(self):
+        t = self._step_count - self._warmup_steps
+        t_max = self._max_steps - self._warmup_steps
+        if t > 0 and t % self._decay_step == 0:
+            self._last_decay = 0.5 * (1.0 + math.cos(math.pi * t / t_max))
+        return self._last_decay
+
+
+class MultiStepLR(ConstantLR):
+    """LR scheduler with multi-steps decay."""
+
+    def __init__(self, lr_max, decay_steps, decay_gamma, **kwargs):
+        super().__init__(lr_max=lr_max, **kwargs)
+        self._decay_steps, self._decay_gamma = decay_steps, decay_gamma
+        self._stage_count, self._num_stages = 0, len(decay_steps)
+
+    def get_decay(self):
+        if self._stage_count < self._num_stages:
+            k = self._decay_steps[self._stage_count]
+            while self._step_count >= k:
+                self._stage_count += 1
+                if self._stage_count >= self._num_stages:
+                    break
+                k = self._decay_steps[self._stage_count]
+            self._last_decay = self._decay_gamma**self._stage_count
+        return self._last_decay
diff --git a/URSA/diffnext/engine/model_ema.py b/URSA/diffnext/engine/model_ema.py
new file mode 100644
index 0000000000000000000000000000000000000000..e28d7661eeb27792d28d27a900712fbf5f3a94ac
--- /dev/null
+++ b/URSA/diffnext/engine/model_ema.py
@@ -0,0 +1,44 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Exponential Moving Average (EMA) of model updates."""
+
+import copy
+import torch
+
+
+class ModelEMA(torch.nn.Module):
+    """Model Exponential Moving Average."""
+
+    def __init__(self, model, decay=0.99, update_every=100, device="gpu"):
+        super().__init__()
+        self.decay = decay
+        self.update_every = update_every
+        self.model = copy.deepcopy(model).eval()
+        self.model._apply(lambda t: t.float() if t.requires_grad else t) if decay < 1 else None
+        [setattr(p, "requires_grad", False) for p in self.model.parameters()]
+        self.model.cpu() if device == "cpu" else None
+
+    def forward(self, *args, **kwargs):
+        return self.model(*args, **kwargs)
+
+    @torch.no_grad()
+    def update(self, model):
+        for ema_v, model_v in zip(self.model.parameters(), model.parameters()):
+            if not model_v.requires_grad:
+                continue
+            new_value = model_v.data.float()
+            value = ema_v.to(device=new_value.device)
+            ema_v.copy_(value.mul_(self.decay).add_(new_value, alpha=1 - self.decay))
diff --git a/URSA/diffnext/engine/train_engine.py b/URSA/diffnext/engine/train_engine.py
new file mode 100644
index 0000000000000000000000000000000000000000..9f3a4668b96d6a58735c754dcef2890b34f04b40
--- /dev/null
+++ b/URSA/diffnext/engine/train_engine.py
@@ -0,0 +1,195 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, esither express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Custom trainer focused on data parallelism specialization."""
+
+import collections
+import os
+import shutil
+
+import torch
+
+from diffnext.engine import engine_utils
+from diffnext.engine.model_ema import ModelEMA
+from diffnext.pipelines.builder import build_pipeline
+from diffnext.pipelines.builder import get_pipeline_path
+from diffnext.utils import accelerate_utils
+from diffnext.utils import profiler
+from diffnext.utils.omegaconf_utils import config_to_class
+from diffnext.utils.omegaconf_utils import config_to_object
+
+
+class Trainer(object):
+    """Schedule the iterative model training."""
+
+    def __init__(self, config, accelerator, logger):
+        """Create a trainer instance."""
+        self.config, self.accelerator, self.logger = config, accelerator, logger
+        self.dtype = accelerate_utils.precision_to_dtype(config.training.mixed_precision)
+        self.train_dataloader = config_to_object(config.train_dataloader)
+        self.pipe_path = get_pipeline_path(**config.pipeline.paths)
+        self.pipe = build_pipeline(self.pipe_path, config_to_class(config.pipeline), self.dtype)
+        self.pipe = self.pipe.to(device=engine_utils.get_device(config.training.gpu_id))
+        self.ema = ModelEMA(self.pipe.model, **config.ema.params) if "ema" in config else None
+        self.model = self.pipe.configure_model(config, accelerator, logger)
+        param_groups = [_ for _ in self.model.parameters() if _.requires_grad]
+        if config.optimizer.get("param_groups", True):
+            param_groups = engine_utils.get_param_groups(self.model)
+        self.optimizer = config_to_object(config.optimizer, params=param_groups)
+        self.scheduler = config_to_object(config.lr_scheduler)
+        self.model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
+        if config.training.get("sequence_parallel_size", 1) > 1:
+            if not hasattr(self.pipe.model, "configure_sequence_parallel"):
+                raise RuntimeError("Model does not support sequence parallelism.")
+            self.pipe.model.configure_sequence_parallel(self.model.seq_parallel_group)
+        self.metrics = collections.OrderedDict()
+        if self.ema and config.experiment.resume_iter > 0:
+            ckpt = config.experiment.resume_from_checkpoint
+            ema_ckpt = ckpt.replace("checkpoints", "ema_checkpoints")
+            ema_weights = os.path.join(ema_ckpt, config.model.name, "diffusion_pytorch_model.bin")
+            engine_utils.load_weights(self.ema.model, ema_weights)
+
+    @property
+    def global_step(self) -> int:
+        """Return the global iteration step.
+
+        Returns:
+            int: The global step.
+        """
+        return self.scheduler._step_count
+
+    def save(self):
+        """Save the checkpoint of current iterative step."""
+        f = "checkpoint-{}/{}".format(self.global_step, self.config.model.name)
+        f = os.path.join(self.config.experiment.output_dir, "checkpoints", f)
+        if self.accelerator.is_main_process and not os.path.exists(f):
+            self.model.save_pretrained(f, safe_serialization=False)
+            self.logger.info("Wrote snapshot to: {:s}".format(f))
+            if self.ema is not None:
+                config_json = os.path.join(f, "config.json")
+                f = f.replace("checkpoints", "ema_checkpoints")
+                os.makedirs(f), shutil.copy(config_json, os.path.join(f, "config.json"))
+                f = os.path.join(f, "diffusion_pytorch_model.bin")
+                torch.save(self.ema.model.state_dict(), f)
+
+    def add_metrics(self, stats):
+        """Add or update the metrics.
+
+        Args:
+            stats (Dict)
+                The current iteration stats.
+        """
+        for k, v in stats["metrics"].items():
+            if k not in self.metrics:
+                self.metrics[k] = profiler.SmoothedValue()
+            self.metrics[k].update(v)
+
+    def log_metrics(self, stats):
+        """Send metrics to available trackers.
+
+        Args:
+            stats (Dict)
+                The current iteration stats.
+        """
+        iter_template = "Iteration %d, lr = %.8f, time = %.2fs"
+        metric_template = " " * 4 + "Train net output({}): {}"
+        [self.logger.info(iter_template % (stats["step"], stats["lr"], stats["time"]))]
+        [self.logger.info(metric_template.format(k, v)) for k, v in self.metrics.items()]
+        tracker_logs = dict((k, v.median) for k, v in self.metrics.items())
+        tracker_logs.update({"lr": stats["lr"], "time": stats["time"]})
+        self.accelerator.log(tracker_logs, step=stats["step"])
+        self.metrics.clear()
+
+    def run_model(self, inputs, metrics, accum_steps=1):
+        """Run multiple model steps.
+
+        Args:
+            inputs (Dict)
+                The model inputs.
+            metrics (Dict)
+                The current iteration metrics.
+            accum_step (int, optional, defaults to 1)
+                The gradient accumulation steps.
+
+        """
+        for _ in range(accum_steps):
+            inputs = inputs if inputs else self.train_dataloader.next()[0]
+            outputs, losses = self.model(inputs), []
+            for k, v in outputs.items():
+                if "loss" not in k and "metric" not in k:
+                    continue
+                if isinstance(v, torch.Tensor) and v.requires_grad:
+                    losses.append(v)
+                if k.startswith("metric/"):  # Custom metrics.
+                    metrics[k[len("metric/") :]] += float(v.mean()) / accum_steps
+                elif f"metric/{k}" not in outputs:  # Legacy metrics.
+                    metrics[k] += float(self.accelerator.gather(v).mean()) / accum_steps
+            losses = sum(losses[1:], losses[0])
+            self.accelerator.accumulate().__enter__()
+            self.accelerator.backward(losses)
+
+    def run_step(self, inputs, accum_steps=1) -> dict:
+        """Run single iteration step.
+
+        Args:
+            inputs (Dict)
+                The model inputs.
+            accum_step (int, optional, defaults to 1)
+                The gradient accumulation steps.
+
+        Returns:
+            Dict: The current iteration stats.
+        """
+        stats = {"step": self.global_step}
+        metrics = collections.defaultdict(float)
+        timer = profiler.Timer().tic()
+        stats["lr"] = self.scheduler.get_lr()
+        for group in self.optimizer.param_groups:
+            group["lr"] = stats["lr"] * group.get("lr_scale", 1.0)
+        self.run_model(inputs, metrics, accum_steps)
+        self.optimizer.step()
+        self.optimizer.zero_grad(set_to_none=True)
+        self.scheduler.step()
+        stats["time"] = timer.toc()
+        stats["metrics"] = collections.OrderedDict(sorted(metrics.items()))
+        return stats
+
+    def train_loop(self):
+        """Training loop."""
+        timer = profiler.Timer()
+        max_steps = self.config.training.max_train_steps
+        accum_steps = self.config.training.gradient_accumulation_steps
+        log_every = self.config.experiment.log_every
+        save_every = self.config.experiment.save_every
+        data_every, inputs = self.config.training.get("data_every", -1), {}
+        self.scheduler._step_count = self.config.experiment.get("resume_iter", 0)
+        while self.global_step < max_steps:
+            if data_every >= 1 and self.global_step % data_every == 0:
+                inputs = self.train_dataloader.next()[0]
+            with timer.tic_and_toc():
+                stats = self.run_step(inputs, accum_steps)
+            self.add_metrics(stats)
+            if stats["step"] % log_every == 0:
+                self.log_metrics(stats)
+            if self.global_step % (10 * log_every) == 0:
+                self.logger.info(profiler.get_progress(timer, self.global_step, max_steps))
+            if self.ema and self.global_step % self.ema.update_every == 0:
+                self.ema.update(self.model)
+            if self.global_step % save_every == 0:
+                self.save()
+        stats["step"] = self.global_step
+        self.log_metrics({**stats, **{"step": self.global_step}})
+        self.accelerator.wait_for_everyone()
+        self.accelerator.end_training()
diff --git a/URSA/diffnext/models/__init__.py b/URSA/diffnext/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..5dba9c7ce2aff73501c11425026c9d293bc2d827
--- /dev/null
+++ b/URSA/diffnext/models/__init__.py
@@ -0,0 +1,16 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Models."""
diff --git a/URSA/diffnext/models/diffusion_mlp.py b/URSA/diffnext/models/diffusion_mlp.py
new file mode 100644
index 0000000000000000000000000000000000000000..308fde7ac859d913a937ac595de3485456fe90d5
--- /dev/null
+++ b/URSA/diffnext/models/diffusion_mlp.py
@@ -0,0 +1,99 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Diffusion MLP."""
+
+import torch
+from torch import nn
+from torch.utils.checkpoint import checkpoint as apply_ckpt
+
+from diffnext.models.embeddings import PatchEmbed
+from diffnext.models.normalization import AdaLayerNormZero
+
+
+class Projector(nn.Module):
+    """MLP Projector layer."""
+
+    def __init__(self, dim, mlp_dim=None, out_dim=None):
+        super(Projector, self).__init__()
+        self.fc1 = nn.Linear(dim, mlp_dim or dim)
+        self.fc2 = nn.Linear(mlp_dim or dim, out_dim or dim)
+        self.activation = nn.SiLU()
+
+    def forward(self, x) -> torch.Tensor:
+        return self.fc2(self.activation(self.fc1(x)))
+
+
+class DiffusionBlock(nn.Module):
+    """Diffusion block."""
+
+    def __init__(self, dim):
+        super(DiffusionBlock, self).__init__()
+        self.dim, self.mlp_checkpointing = dim, False
+        self.norm1 = AdaLayerNormZero(dim, num_stats=3, eps=1e-6)
+        self.proj, self.norm2 = Projector(dim, dim, dim), nn.LayerNorm(dim)
+
+    def forward(self, x, z) -> torch.Tensor:
+        if self.mlp_checkpointing and x.requires_grad:
+            h, (gate,) = apply_ckpt(self.norm1, x, z, use_reentrant=False)
+            return self.norm2(apply_ckpt(self.proj, h, use_reentrant=False)).mul(gate).add_(x)
+        h, (gate,) = self.norm1(x, z)
+        return self.norm2(self.proj(h)).mul(gate).add_(x)
+
+
+class TimeCondEmbed(nn.Module):
+    """Time-Condition embedding layer."""
+
+    def __init__(self, cond_dim, embed_dim, freq_dim=256):
+        super(TimeCondEmbed, self).__init__()
+        self.timestep_proj = Projector(freq_dim, embed_dim, embed_dim)
+        self.condition_proj = Projector(cond_dim, embed_dim, embed_dim)
+        self.freq_dim, self.time_freq = freq_dim, None
+
+    def get_freq_embed(self, timestep, dtype) -> torch.Tensor:
+        if self.time_freq is None:
+            dim, log_theta = self.freq_dim // 2, 9.210340371976184  # math.log(10000)
+            freq = torch.arange(dim, dtype=torch.float32, device=timestep.device)
+            self.time_freq = freq.mul(-log_theta / dim).exp().unsqueeze(0)
+        emb = timestep.unsqueeze(-1).float() * self.time_freq
+        return torch.cat([emb.cos(), emb.sin()], dim=-1).to(dtype=dtype)
+
+    def forward(self, timestep, z) -> torch.Tensor:
+        t = self.timestep_proj(self.get_freq_embed(timestep, z.dtype))
+        return self.condition_proj(z).add_(t.unsqueeze_(1) if t.dim() == 2 else t)
+
+
+class DiffusionMLP(nn.Module):
+    """Diffusion MLP model."""
+
+    def __init__(self, depth, embed_dim, cond_dim, patch_size=2, image_dim=4):
+        super(DiffusionMLP, self).__init__()
+        self.patch_embed = PatchEmbed(image_dim, embed_dim, patch_size)
+        self.time_cond_embed = TimeCondEmbed(cond_dim, embed_dim)
+        self.blocks = nn.ModuleList(DiffusionBlock(embed_dim) for _ in range(depth))
+        self.norm = AdaLayerNormZero(embed_dim, num_stats=2, eps=1e-6)
+        self.head = nn.Linear(embed_dim, patch_size**2 * image_dim)
+
+    def forward(self, x, timestep, z, pred_ids=None) -> torch.Tensor:
+        x, o = self.patch_embed(x), None if pred_ids is None else x
+        o = None if pred_ids is None else self.patch_embed.patchify(o)
+        x = x if pred_ids is None else x.gather(1, pred_ids.expand(-1, -1, x.size(-1)))
+        z = z if pred_ids is None else z.gather(1, pred_ids.expand(-1, -1, z.size(-1)))
+        z = self.time_cond_embed(timestep, z)
+        for blk in self.blocks:
+            x = blk(x, z)
+        x = self.norm(x, z)[0]
+        x = self.head(x)
+        return x if pred_ids is None else o.scatter(1, pred_ids.expand(-1, -1, x.size(-1)), x)
diff --git a/URSA/diffnext/models/diffusion_transformer.py b/URSA/diffnext/models/diffusion_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..7eb9764cae6967efe9a9466e538b71fe8168319c
--- /dev/null
+++ b/URSA/diffnext/models/diffusion_transformer.py
@@ -0,0 +1,151 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Diffusion Transformer."""
+
+from functools import partial
+from typing import Tuple
+
+import torch
+from torch import nn
+from torch.utils.checkpoint import checkpoint as apply_ckpt
+
+from diffnext.models.embeddings import PatchEmbed, RotaryEmbed3D
+from diffnext.models.normalization import AdaLayerNormZero, AdaLayerNormSingle
+from diffnext.models.diffusion_mlp import Projector, TimeCondEmbed
+
+
+class TimeEmbed(TimeCondEmbed):
+    """Time embedding layer."""
+
+    def __init__(self, embed_dim, freq_dim=256):
+        nn.Module.__init__(self)
+        self.timestep_proj = Projector(freq_dim, embed_dim, embed_dim)
+        self.freq_dim, self.time_freq = freq_dim, None
+
+    def forward(self, timestep) -> torch.Tensor:
+        dtype = self.timestep_proj.fc1.weight.dtype
+        temb = self.timestep_proj(self.get_freq_embed(timestep, dtype))
+        return temb.unsqueeze_(1) if temb.dim() == 2 else temb
+
+
+class MLP(nn.Module):
+    """Two layers MLP."""
+
+    def __init__(self, dim, mlp_ratio=4):
+        super(MLP, self).__init__()
+        self.fc1 = nn.Linear(dim, int(dim * mlp_ratio))
+        self.fc2 = nn.Linear(int(dim * mlp_ratio), dim)
+        self.activation = nn.GELU()
+
+    def forward(self, x) -> torch.Tensor:
+        return self.fc2(self.activation(self.fc1(x)))
+
+
+class Attention(nn.Module):
+    """Multihead attention."""
+
+    def __init__(self, dim, num_heads, qkv_bias=True):
+        super(Attention, self).__init__()
+        self.num_heads, self.head_dim = num_heads, dim // num_heads
+        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
+        self.proj, self.pe_func = nn.Linear(dim, dim), None
+
+    def forward(self, x) -> torch.Tensor:
+        qkv_shape = [-1, x.size(1), 3, self.num_heads, self.head_dim]
+        q, k, v = self.qkv(x).view(qkv_shape).permute(2, 0, 3, 1, 4).unbind(dim=0)
+        q, k = (self.pe_func(q), self.pe_func(k)) if self.pe_func else (q, k)
+        o = nn.functional.scaled_dot_product_attention(q, k, v)
+        return self.proj(o.transpose(1, 2).flatten(2))
+
+
+class Block(nn.Module):
+    """Transformer block."""
+
+    def __init__(self, dim, num_heads, mlp_ratio=4, qkv_bias=True, modulation_type=None):
+        super(Block, self).__init__()
+        self.modulation = (modulation_type or AdaLayerNormZero)(dim, num_stats=6, eps=1e-6)
+        self.norm1, self.norm2 = nn.LayerNorm(dim), nn.LayerNorm(dim)
+        self.attn = Attention(dim, num_heads, qkv_bias=qkv_bias)
+        self.mlp = MLP(dim, mlp_ratio=mlp_ratio)
+        self.attn_checkpointing = self.mlp_checkpointing = self.stg_skip = False
+
+    def forward_modulation(self, x, z) -> Tuple[torch.Tensor, Tuple[torch.Tensor]]:
+        return self.modulation(x, z)
+
+    def forward_attn(self, x) -> torch.Tensor:
+        return self.norm1(self.attn(x))
+
+    def forward_mlp(self, x) -> torch.Tensor:
+        return self.norm2(self.mlp(x))
+
+    def forward_ckpt(self, x, name) -> torch.Tensor:
+        if getattr(self, f"{name}_checkpointing", False) and x.requires_grad:
+            return apply_ckpt(getattr(self, f"forward_{name}"), x, use_reentrant=False)
+        return getattr(self, f"forward_{name}")(x)
+
+    def forward(self, x, z, pe_func: callable = None) -> torch.Tensor:
+        self.attn.pe_func = pe_func
+        stg_x = x.chunk(3)[-1] if self.stg_skip else None
+        if self.mlp_checkpointing and x.requires_grad:
+            x, stats = apply_ckpt(self.forward_modulation, x, z, use_reentrant=False)
+        else:
+            x, stats = self.forward_modulation(x, z)
+        gate_msa, scale_mlp, shift_mlp, gate_mlp = stats
+        x = self.forward_ckpt(x, "attn").mul(gate_msa).add_(x)
+        x = self.modulation.norm(x).mul(1 + scale_mlp).add_(shift_mlp)
+        x = self.forward_ckpt(x, "mlp").mul(gate_mlp).add_(x)
+        return torch.cat(x.chunk(3)[:2] + (stg_x,)) if self.stg_skip else x
+
+
+class DiffusionTransformer(nn.Module):
+    """Diffusion transformer."""
+
+    def __init__(
+        self,
+        depth,
+        embed_dim,
+        num_heads,
+        mlp_ratio=4,
+        patch_size=2,
+        image_size=32,
+        image_dim=None,
+        modulation=True,
+    ):
+        super(DiffusionTransformer, self).__init__()
+        final_norm = AdaLayerNormSingle if modulation else AdaLayerNormZero
+        block = partial(Block, modulation_type=AdaLayerNormSingle) if modulation else Block
+        self.embed_dim, self.image_size, self.image_dim = embed_dim, image_size, image_dim
+        self.patch_embed = PatchEmbed(image_dim, embed_dim, patch_size)
+        self.time_embed = TimeEmbed(embed_dim)
+        self.modulation = AdaLayerNormZero(embed_dim, num_stats=6, eps=1e-6) if modulation else None
+        self.rope = RotaryEmbed3D(embed_dim // num_heads)
+        self.blocks = nn.ModuleList(block(embed_dim, num_heads, mlp_ratio) for _ in range(depth))
+        self.norm = final_norm(embed_dim, num_stats=2, eps=1e-6)
+        self.head = nn.Linear(embed_dim, patch_size**2 * image_dim)
+
+    def prepare_pe(self, c=None, pos=None) -> Tuple[callable, callable]:
+        return self.rope.get_func(pos, pad=0 if c is None else c.size(1))
+
+    def forward(self, x, timestep, c=None, pos=None) -> torch.Tensor:
+        x = self.patch_embed(x)
+        t = self.time_embed(timestep)
+        z = self.modulation.proj(self.modulation.activation(t)) if self.modulation else t
+        pe = self.prepare_pe(c, pos) if pos is not None else None
+        x = x if c is None else torch.cat([c, x], dim=1)
+        for blk in self.blocks:
+            x = blk(x, z, pe)
+        x = self.norm(x if c is None else x[:, c.size(1) :], t)[0]
+        return self.head(x)
diff --git a/URSA/diffnext/models/embeddings.py b/URSA/diffnext/models/embeddings.py
new file mode 100644
index 0000000000000000000000000000000000000000..da1ddb70df96f7201e3c07ddd760f6d81e7042d7
--- /dev/null
+++ b/URSA/diffnext/models/embeddings.py
@@ -0,0 +1,361 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Embedding layers."""
+
+import sys
+from typing import List, Tuple, Union
+
+import numpy as np
+import scipy.stats as stats
+import torch
+from torch import nn
+
+
+class FlexRotaryEmbedding(nn.Identity):
+    """Flexible rotary position embedding layer."""
+
+    class PEFunc(object):
+        """Apply RoPE weight to Q/K tensor."""
+
+        def __init__(self, weight: torch.Tensor):
+            self.weight = weight
+
+        @torch.compile(fullgraph=True, disable=sys.platform != "linux")
+        def interleaved_impl(self, x: torch.Tensor, w: torch.Tensor) -> torch.Tensor:
+            return w[..., 0].mul(x[..., 0]).add_(w[..., 1] * x[..., 1]).flatten(3)
+
+        @torch.compile(fullgraph=True, disable=sys.platform != "linux")
+        def partitioned_impl(self, x: torch.Tensor, w: torch.Tensor) -> torch.Tensor:
+            return w[..., 0].mul(x[:, :, :, 0]).add_(w[..., 1] * x[:, :, :, 1]).flatten(3)
+
+        def __call__(self, x: torch.Tensor, interleaved=False) -> torch.Tensor:
+            w = self.weight = self.weight.to(dtype=x.dtype)
+            x = x.unflatten(-1, (-1, 1, 2) if interleaved else (2, -1, 1))
+            return (self.interleaved_impl if interleaved else self.partitioned_impl)(x, w)
+
+    @staticmethod
+    def from_config(config):
+        head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads)
+        # return FlexRotaryEmbedding(head_dim, base=config.rope_theta)
+        base = getattr(config, "rope_theta", None)
+        if base is None and hasattr(config, "to_dict"):
+            base = config.to_dict().get("rope_theta", None)
+        if base is None:
+            base = 10000.0
+        return FlexRotaryEmbedding(head_dim, base=float(base))
+
+    def __init__(self, dim=128, base=10000.0):
+        super(FlexRotaryEmbedding, self).__init__()
+        self.dim, self.base = dim, base
+        self.rep1, self.rep2 = dim // 8, (dim // 2 - dim // 8 * 3) // 2
+        self.register_buffer("scale", torch.arange(0, dim, 2).float() / dim, persistent=False)
+
+    def get_pos(self, input_shape, shift=0, has_bov=True) -> torch.Tensor:
+        num_blocks = 1 if len(input_shape) < 4 else input_shape[-4]
+        block_size = 1 if len(input_shape) < 3 else input_shape[-3]
+        grid_shape = [num_blocks * block_size] + list(input_shape[-2:])
+        pos = torch.zeros(grid_shape + [3], dtype=torch.int32, device=self.scale.device)
+        grid = [torch.arange(_, device=pos.device) for _ in grid_shape]
+        [pos[..., i].add_(grid[i].view([-1 if i == j else 1 for j in range(3)])) for i in range(3)]
+        pos, device = pos.unflatten(0, (-1, block_size)).flatten(1, 3), pos.device
+        bov_pos = torch.arange(num_blocks, device=device).view(-1, 1, 1).repeat(1, 1, 3)
+        pos[..., 0] += torch.arange(num_blocks, device=device).view(-1, 1).add_(shift + has_bov)
+        return torch.cat([bov_pos.mul(block_size + 1).add(shift), pos], 1) if has_bov else pos
+
+    def get_func(self, pos: torch.Tensor, *args, **kwargs) -> PEFunc:
+        t = torch.cat([pos.repeat(1, 1, self.rep1), pos[..., 1:].repeat(1, 1, self.rep2)], -1)
+        freq = t * torch.pow(self.base, self.scale.float()).reciprocal_().unsqueeze(0)
+        freq = torch.stack([freq.cos(), -freq.sin(), freq.sin(), freq.cos()], dim=-1)
+        return self.PEFunc(freq.view(freq.shape[:-1] + (2, 2)).unsqueeze(2))
+
+
+class RotaryEmbed3D(nn.Identity):
+    """3D rotary position embedding layer."""
+
+    class PEFunc(object):
+        """Apply RoPE weight to Q/K tensor."""
+
+        def __init__(self, weight: torch.Tensor):
+            self.weight = weight
+
+        @torch.compile(fullgraph=True, disable=sys.platform != "linux")
+        def call_impl(self, x: torch.Tensor, w: torch.Tensor) -> torch.Tensor:
+            return w[..., 0].mul(x[..., 0]).add_(w[..., 1] * x[..., 1]).flatten(3)
+
+        def __call__(self, x: torch.Tensor) -> torch.Tensor:
+            x = x.view(*x.shape[:-1], -1, 1, 2)
+            w = self.weight = self.weight.to(dtype=x.dtype)
+            return self.call_impl(x, w)
+
+    def __init__(self, dim=64, base_size=(16, 16), theta=10000.0):
+        super(RotaryEmbed3D, self).__init__()
+        self.dim, self.base_size, self.theta = dim, base_size, theta
+        for i, rotary_dim in enumerate(([dim // 8] + [(dim - dim // 8) // 2] * 2)):
+            scale = torch.arange(0, rotary_dim, 2).float().div_(rotary_dim)
+            self.register_buffer("scale%d" % i, scale, persistent=False)
+
+    def get_pos(self, t=1, bs=1, hw=None) -> torch.Tensor:
+        thw = [t] + list(hw or self.base_size)
+        pos = torch.zeros(thw + [3], device=self.scale1.device)
+        grid = [torch.arange(_, device=self.scale1.device) for _ in thw]
+        [pos[..., i].add_(grid[i].view([-1 if i == j else 1 for j in range(3)])) for i in range(3)]
+        return pos.view(1, -1, 3).expand(bs, -1, -1)
+
+    def get_func(self, pos: torch.Tensor, pad=0, ids: torch.Tensor = None) -> PEFunc:
+        pos, weight = pos.gather(1, ids) if ids is not None else pos, []
+        pos = nn.functional.pad(pos, (0, 0, pad, 0), value=0) if pad else pos
+        for i, grid in enumerate(pos.split(1, dim=-1)):
+            freq = torch.pow(self.theta, getattr(self, "scale%d" % i).float())
+            freq = grid * freq.reciprocal().unsqueeze(0)
+            freq = torch.stack([freq.cos(), -freq.sin(), freq.sin(), freq.cos()], dim=-1)
+            weight += [freq.view(freq.shape[:-1] + (2, 2))]
+        return self.PEFunc(torch.cat(weight, dim=-3).unsqueeze(1))
+
+
+class PosEmbed(nn.Module):
+    """Position embedding layer."""
+
+    def __init__(self, dim, base_size=(16, 16)):
+        super(PosEmbed, self).__init__()
+        (self.base_h, self.base_w), self.space_embed = base_size, None
+        self.freq_hw = 1 / (10000 ** (torch.arange(dim // 4, dtype=torch.float32) / (dim // 4)))
+
+    def get_space_embed(self, device=None, dtype=None) -> torch.Tensor:
+        h, w = self.base_h, self.base_w
+        if self.space_embed is not None and self.space_embed.size(0) == h * w:
+            return self.space_embed
+        grid_h = torch.arange(h, dtype=torch.float32) * (self.base_h / h)
+        grid_w = torch.arange(w, dtype=torch.float32) * (self.base_w / w)
+        grid_w, grid_h = torch.meshgrid(grid_w, grid_h, indexing="xy")
+        freq_w, freq_h = [_.reshape(-1, 1) * self.freq_hw.unsqueeze(0) for _ in (grid_w, grid_h)]
+        embed = torch.cat([freq_w.sin(), freq_w.cos(), freq_h.sin(), freq_h.cos()], dim=-1)
+        self.space_embed = embed.to(device=device, dtype=dtype)
+        return self.space_embed
+
+    def forward(self, x) -> torch.Tensor:
+        return x.add_(self.get_space_embed(x.device, x.dtype))
+
+
+class VideoPosEmbed(PosEmbed):
+    """Video position embedding layer."""
+
+    def __init__(self, dim, base_size):
+        super(VideoPosEmbed, self).__init__(dim, base_size=base_size[1:])
+        self.base_t, self.time_embed, self.norm = base_size[0], None, nn.LayerNorm(dim)
+        self.time_proj = nn.Sequential(nn.Linear(256, dim), nn.SiLU(), nn.Linear(dim, dim))
+        self.freq_t = 1 / (10000 ** (torch.arange(128, dtype=torch.float32).unsqueeze(0) / 128))
+
+    def get_time_embed(self, t) -> torch.Tensor:
+        if self.time_embed is not None and t == self.time_embed.size(0):
+            return self.norm(self.time_proj(self.time_embed))
+        device, dtype = self.time_proj[0].weight.device, self.time_proj[0].weight.dtype
+        grid = torch.arange(t, dtype=torch.float32) / (t / self.base_t)
+        freq_t = grid.view(-1, 1, 1).mul(self.freq_t)
+        sincos = torch.cat([freq_t.sin(), freq_t.cos()], dim=-1)
+        self.time_embed = sincos.to(device=device, dtype=dtype)
+        return self.norm(self.time_proj(self.time_embed))
+
+    def forward(self, x) -> torch.Tensor:
+        x = x.add_(self.get_time_embed(x.size(-3))) if x.dim() == 4 else x
+        return x.add_(self.get_space_embed(x.device, x.dtype))
+
+
+class MotionEmbed(nn.Module):
+    """Motion embedding layer."""
+
+    def __init__(self, dim, base_flow=5, base_fps=12):
+        super(MotionEmbed, self).__init__()
+        self.base_flow, self.base_fps = base_flow, base_fps
+        self.flow_proj = nn.Sequential(nn.Linear(256, dim), nn.SiLU(), nn.Linear(dim, dim))
+        self.fps_proj = nn.Sequential(nn.Linear(256, dim), nn.SiLU(), nn.Linear(dim, dim))
+        self.freq_m = 1 / (10000 ** (torch.arange(128, dtype=torch.float32).unsqueeze(0) / 128))
+
+    def get_embed(self, c, x, k) -> torch.Tensor:
+        x = [getattr(self, f"base_{k}")] * c.size(0) if x is None else x
+        freq_m = torch.as_tensor(x).view(-1, 1, 1).float().mul(self.freq_m)
+        sincos = torch.cat([freq_m.sin(), freq_m.cos()], dim=-1)
+        return getattr(self, f"{k}_proj")(sincos.to(device=c.device, dtype=c.dtype))
+
+    def forward(self, c, flow=None, fps=None) -> torch.Tensor:
+        outputs = [self.get_embed(c, x, k) for k, x in [("flow", flow), ("fps", fps)]]
+        return torch.cat(outputs, dim=1) if len(outputs) > 1 else outputs[0]
+
+
+class PatchEmbed(nn.Module):
+    """Patch embedding layer."""
+
+    def __init__(self, image_dim, embed_dim, patch_size):
+        super(PatchEmbed, self).__init__()
+        self.patch_size = patch_size
+        self.image_dim, self.height, self.width = image_dim, None, None
+        self.proj = nn.Conv2d(image_dim, embed_dim, patch_size, patch_size)
+
+    @property
+    def hw(self) -> Tuple[int, int]:
+        return self.height, self.width
+
+    def patchify(self, x) -> torch.Tensor:
+        x = x.view(-1, self.image_dim, self.height, self.patch_size, self.width, self.patch_size)
+        return x.permute(0, 2, 4, 3, 5, 1).flatten(1, 2).flatten(2, 4).contiguous()
+
+    def unpatchify(self, x) -> torch.Tensor:
+        x = x.view(-1, self.height, self.width, self.patch_size, self.patch_size, self.image_dim)
+        return x.permute(0, 5, 1, 3, 2, 4).flatten(2, 3).flatten(3, 4).contiguous()
+
+    def forward(self, x) -> torch.Tensor:
+        flat_shape = (x.size(0), x.size(2)) if x.dim() == 5 else None
+        x = x.transpose(1, 2).flatten(0, 1) if x.dim() == 5 else x
+        self.width = x.size(-1) // self.patch_size if x.dim() == 4 else self.width
+        self.height = x.size(-2) // self.patch_size if x.dim() == 4 else self.height
+        x = self.proj(x).flatten(2).transpose(1, 2) if x.dim() == 4 else x
+        return x.view(flat_shape + x.shape[1:]) if flat_shape else x
+
+
+class TextEmbed(nn.Module):
+    """Encode text tokens into embeddings."""
+
+    def __init__(self, token_dim, embed_dim, num_tokens=256, dropout=0.1):
+        super(TextEmbed, self).__init__()
+        self.token_dim, self.num_tokens, self.encoders = token_dim, num_tokens, []
+        self.proj, self.norm = nn.Linear(token_dim, embed_dim), nn.LayerNorm(embed_dim)
+        self.register_buffer("weight", torch.zeros(512, token_dim))  # Maximum positions.
+        _, self.dropout, self.mask = nn.init.normal_(self.weight, std=0.02), dropout, []
+
+    @torch.no_grad()
+    def encode_prompts(self, prompts, prompt_size=None) -> torch.Tensor:
+        device, dtype = self.weight.device, self.weight.dtype
+        x = self.weight[: self.num_tokens].expand(len(prompts), -1, -1).clone()
+        for i, p in enumerate(prompts if not isinstance(prompts[0], str) else []):
+            if self.training and self.dropout > 0 and np.random.rand() < self.dropout:
+                continue
+            x[i, : p.shape[0]] = torch.as_tensor(p, device=device).to(dtype)
+        if not isinstance(prompts[0], str):
+            return x
+        tokenizer, encoder = self.encoders
+        trunc_args = {"max_length": self.num_tokens, "truncation": True}
+        pad_args = {"padding": "max_length", **trunc_args}
+        tokens = [tokenizer(p, **pad_args).input_ids for p in prompts]
+        maxlens = [len(tokenizer(p, **trunc_args).input_ids) for p in prompts]
+        tokens = torch.as_tensor(tokens, device=encoder.device)
+        embeds, x = encoder(tokens).last_hidden_state.to(dtype), x.to(encoder.device)
+        self.mask = [0] * (x.size(0) // prompt_size if prompt_size else 0)
+        for i, maxlen in enumerate([] if prompt_size else maxlens):
+            if self.training and self.dropout and np.random.rand() < self.dropout:
+                continue
+            x[i, :maxlen] = embeds[i, :maxlen]
+        for k in range(x.size(0) // prompt_size if prompt_size else 0):
+            if np.random.rand() < self.dropout:
+                self.mask[k] = 1
+                continue
+            for j in range(prompt_size):
+                if j and np.random.rand() < self.dropout:
+                    continue
+                i, maxlen = k * prompt_size + j, maxlens[k * prompt_size + j]
+                x[i, :maxlen] = embeds[i, :maxlen]
+        return x
+
+    def apply_mask(self, x, mask_token=0) -> torch.Tensor:
+        """Apply the current mask to input."""
+        if len(self.mask) == 0:
+            return x
+        mask = torch.as_tensor(self.mask, device=x.device, dtype=x.dtype)
+        mask = mask.view([-1] + [1] * (x.dim() - 1))
+        return x.mul(1 - mask).add_(mask_token * mask)
+
+    def forward(self, x, prompt_size=None) -> torch.Tensor:
+        if isinstance(x, (tuple, list)):
+            return self.norm(self.proj(self.encode_prompts(x, prompt_size)))
+        return self.norm(self.proj(x))
+
+
+class LabelEmbed(nn.Module):
+    """Encode class labels into embeddings."""
+
+    def __init__(self, embed_dim, num_classes=1000, dropout=0.1):
+        super(LabelEmbed, self).__init__()
+        self.dropout, self.num_classes = dropout, num_classes
+        self.weight = nn.Parameter(torch.zeros(num_classes + (dropout > 0), embed_dim))
+        _, self.norm = nn.init.normal_(self.weight, std=0.02), nn.LayerNorm(embed_dim)
+
+    def forward(self, input_ids):
+        input_ids = input_ids.unsqueeze(-1) if input_ids.dim() == 1 else input_ids
+        if self.training and self.dropout > 0:
+            keep = torch.rand(input_ids.size(), device=input_ids.device).gt(self.dropout)
+            input_ids = input_ids.where(keep, self.num_classes)
+        return self.norm(self.weight[input_ids])
+
+
+class MaskEmbed(nn.Module):
+    """Apply mask positions to input embeddings."""
+
+    def __init__(self, embed_dim, mask_ratios=(0.7, 1.0)):
+        super(MaskEmbed, self).__init__()
+        self.mask_ratios = list(mask_ratios) + ([0.25] if len(mask_ratios) == 2 else [])
+        self.bos_token = nn.Parameter(torch.zeros(1, embed_dim))
+        self.mask_token = nn.Parameter(torch.zeros(1, embed_dim))
+        [nn.init.normal_(_, std=0.02) for _ in (self.bos_token, self.mask_token)]
+        self.mask, self.attn_mask = None, None
+        self.pred_ids, self.pred_pos, self.generator = None, 0, None
+
+    def get_attn_lens(
+        self, x: Union[torch.Tensor, Tuple[torch.Tensor]], c: torch.Tensor = None
+    ) -> List[int]:
+        """Return the attention length according to inputs."""
+        lens = [_.shape[1:3].numel() for _ in x] if isinstance(x, (tuple, list)) else []
+        lens += [x.size(2)] * x.size(1) if not isinstance(x, (tuple, list)) else []
+        lens[0] += c.size(1) if c is not None else 0
+        return lens
+
+    def get_attn_mask(
+        self, x: Union[torch.Tensor, Tuple[torch.Tensor]], c: torch.Tensor = None, persistent=True
+    ) -> torch.Tensor:
+        """Return the attention mask according to inputs."""
+        if self.attn_mask is not None and persistent:
+            return self.attn_mask
+        if isinstance(x, (tuple, list)):
+            d = torch.cat([torch.full(_.shape[1:3], t) for t, _ in enumerate(x)]).flatten()
+        else:
+            d = torch.cat([torch.full([x.size(2)], i) for i in range(x.size(1))])
+        d = torch.cat([torch.full([c.size(1)], 0), d]) if c is not None else d
+        attn_mask = torch.where(d.unsqueeze(1).ge(d.unsqueeze(0)), 0, -float("inf"))
+        self.attn_mask = attn_mask.to(device=self.bos_token.device, dtype=self.bos_token.dtype)
+        return self.attn_mask
+
+    def get_pred_mask(self, num_preds) -> Tuple[torch.Tensor, torch.Tensor]:
+        """Return the current mask for next prediction."""
+        if self.pred_ids is None:
+            u_dist = torch.empty_like(self.mask).uniform_(generator=self.generator)
+            self.pred_ids = u_dist.argsort(dim=1)
+        pred_ids = self.pred_ids[:, self.pred_pos : self.pred_pos + num_preds]
+        pred_mask = torch.zeros_like(self.mask).scatter_(1, pred_ids, 1)
+        self.pred_pos, self.mask = self.pred_pos + num_preds, self.mask.mul_(1 - pred_mask)
+        return pred_mask, pred_ids
+
+    def apply_mask(self, x) -> torch.Tensor:
+        """Apply the current mask to input."""
+        return x.mul(1 - self.mask).add_(self.mask_token * self.mask)
+
+    def forward(self, x) -> torch.Tensor:
+        if self.training:
+            u_dist = torch.rand(x.shape[:-1] + (1,), device=x.device)
+            a, b = [(v - 1) / self.mask_ratios[2] for v in self.mask_ratios[:2]]
+            mask_ratio = stats.truncnorm(a, b, loc=1, scale=self.mask_ratios[2]).rvs(1)[0]
+            prev_ids = u_dist.argsort(1)[:, : int(np.round((1 - mask_ratio) * u_dist.size(1)))]
+            self.mask = x.new_ones(u_dist.shape).scatter_(1, prev_ids, 0)
+            return self.apply_mask(x), prev_ids
+        if self.mask is None:
+            self.mask, self.pred_pos = x.new_ones(x.shape[:-1] + (1,)), 0
+        return self.apply_mask(x)
diff --git a/URSA/diffnext/models/flash_attention.py b/URSA/diffnext/models/flash_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..8bb2d268d36c1f25af7c79ec77f09bc7546d9dc8
--- /dev/null
+++ b/URSA/diffnext/models/flash_attention.py
@@ -0,0 +1,99 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Flash attention layers. Copied from https://github.com/Dao-AILab/flash-attention"""
+
+import torch
+
+# RoPE (Triton)
+try:
+    from flash_attn.layers.rotary import apply_rotary_emb
+except ImportError:
+    from einops import rearrange, repeat
+
+    def rotate_half(x, interleaved=False) -> torch.Tensor:
+        if not interleaved:
+            x1, x2 = x.chunk(2, dim=-1)
+            return torch.cat((-x2, x1), dim=-1)
+        x1, x2 = x[..., ::2], x[..., 1::2]
+        return rearrange(torch.stack((-x2, x1), dim=-1), "... d two -> ... (d two)", two=2)
+
+    def apply_rotary_emb(x, cos, sin, interleaved=False, inplace=False) -> torch.Tensor:
+        ro_dim = cos.shape[-1] * 2
+        cos = repeat(cos, "... d -> ... 1 (2 d)" if not interleaved else "... d -> ... 1 (d 2)")
+        sin = repeat(sin, "... d -> ... 1 (2 d)" if not interleaved else "... d -> ... 1 (d 2)")
+        return torch.cat(
+            [
+                x[..., :ro_dim] * cos + rotate_half(x[..., :ro_dim], interleaved) * sin,
+                x[..., ro_dim:],
+            ],
+            -1,
+        )
+
+
+# SwiGLU (TorchJIT)
+swiglu_fwd_codestring = """
+template <typename T> T swiglu_fwd(T x, T y) {
+    return float(x) * float(y) / (1.0f + ::exp(-float(x)));
+}
+"""
+swiglu_bwd_codestring = """
+template <typename T> void swiglu_bwd(T x, T y, T g, T& dx, T& dy) {
+    float x_sigmoid = 1.0f / (1.0f + ::exp(-float(x)));
+    dx = x_sigmoid * (1 + float(x) * (1.0f - x_sigmoid)) * float(g) * float(y);
+    dy = float(x) * x_sigmoid * float(g);
+}
+"""
+swiglu_fwd = torch.cuda.jiterator._create_jit_fn(swiglu_fwd_codestring)
+swiglu_bwd = torch.cuda.jiterator._create_multi_output_jit_fn(swiglu_bwd_codestring, num_outputs=2)
+
+
+class SwiGLUFunction(torch.autograd.Function):
+
+    @staticmethod
+    def forward(ctx, x, y):
+        ctx.save_for_backward(x, y)
+        return swiglu_fwd(x, y)
+
+    @staticmethod
+    def backward(ctx, dout):
+        x, y = ctx.saved_tensors
+        return swiglu_bwd(x, y, dout)
+
+
+swiglu = SwiGLUFunction.apply
+
+# RMSNorm (Triton)
+try:
+    from flash_attn.ops.triton.layer_norm import RMSNorm
+except ImportError:
+
+    class RMSNorm(torch.nn.Module):
+
+        def __init__(self, hidden_size, eps: float = 1e-6) -> None:
+            super().__init__()
+            self.weight = torch.nn.Parameter(torch.ones(hidden_size))
+            self.eps = eps
+
+        def forward(self, x: torch.Tensor) -> torch.Tensor:
+            x = x.mul(x.float().square().mean(-1, True).add_(self.eps).rsqrt().to(x.dtype))
+            return x * self.weight
+
+
+# CrossEntropy (Triton)
+try:
+    from flash_attn.ops.triton.cross_entropy import cross_entropy_loss
+except ImportError:
+    cross_entropy_loss = None
diff --git a/URSA/diffnext/models/flex_attention.py b/URSA/diffnext/models/flex_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..936cf987805fe80d970c5bb5df3d45d455bea55f
--- /dev/null
+++ b/URSA/diffnext/models/flex_attention.py
@@ -0,0 +1,81 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Flex attention layers."""
+
+from itertools import accumulate
+from typing import List
+
+import torch
+from torch import nn
+
+try:
+    from torch.nn.attention.flex_attention import create_block_mask
+    from torch.nn.attention.flex_attention import flex_attention
+except ImportError:
+    flex_attention = create_block_mask = None
+
+
+class FlexAttentionCausal2D(nn.Module):
+    """Block-wise causal flex attention."""
+
+    def __init__(self):
+        super(FlexAttentionCausal2D, self).__init__()
+        self.attn_func = self.offsets = self.flags = None
+        self.cu_offsets = self.block_mask = None
+
+    def set_offsets(self, offsets: List[int]):
+        """Set block-wise mask offsets."""
+        offsets = list(type(offsets)([0]) + offsets if offsets[0] != 0 else offsets)
+        if offsets != self.offsets:
+            self.offsets, self.block_mask = offsets, None
+
+    def set_offsets_by_lens(self, lens, flags=None):
+        """Set block-wise mask offsets by lengths."""
+        self.set_offsets(list(accumulate(type(lens)([0]) + lens if lens[0] != 0 else lens)))
+        self.flags = flags  # Bidirectional flags (-1: lower triangular, 1: full)
+
+    def get_mask_mod(self) -> callable:
+        """Return the mask modification."""
+        counts = self.cu_offsets[1:] - self.cu_offsets[:-1]
+        ids = torch.arange(len(counts), device=self.cu_offsets.device, dtype=torch.int32)
+        ids = ids.repeat_interleave(counts)
+        if self.flags is None:
+            return lambda b, h, qi, ki: (qi >= ki) | (ids[qi] == ids[ki])
+        flags = list(self.flags) + [-1] * (len(counts) - len(self.flags))
+        flags = torch.as_tensor(flags, device=self.cu_offsets.device, dtype=torch.int32)
+        flags = flags.repeat_interleave(counts)
+        return lambda b, h, qi, ki: (qi >= ki) | ((ids[qi] * flags[qi]) == ids[ki])
+
+    def get_attn_func(self) -> callable:
+        """Return the attention function."""
+        if flex_attention is None:
+            raise NotImplementedError(f"FlexAttn requires torch>=2.5 but got {torch.__version__}")
+        if self.attn_func is None:
+            self.attn_func = torch.compile(flex_attention)
+        return self.attn_func
+
+    def get_block_mask(self, q: torch.Tensor) -> torch.Tensor:
+        """Return the attention block mask according to inputs."""
+        if self.block_mask is not None:
+            return self.block_mask
+        b, h, q_len = q.shape[:3]
+        args = {"B": b, "H": h, "Q_LEN": q_len, "KV_LEN": q_len, "_compile": True}
+        self.cu_offsets = torch.as_tensor(self.offsets, device=q.device, dtype=torch.int32)
+        self.block_mask = create_block_mask(self.get_mask_mod(), **args)
+        return self.block_mask
+
+    def forward(self, q: torch.Tensor, k: torch.Tensor, v: torch.Tensor) -> torch.Tensor:
+        return self.get_attn_func()(q, k, v, block_mask=self.get_block_mask(q), enable_gqa=True)
diff --git a/URSA/diffnext/models/guidance_scaler.py b/URSA/diffnext/models/guidance_scaler.py
new file mode 100644
index 0000000000000000000000000000000000000000..b684cc2db9b5f7c406de0be3da6e8e48377aff03
--- /dev/null
+++ b/URSA/diffnext/models/guidance_scaler.py
@@ -0,0 +1,87 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Classifier-free guidance scaler."""
+
+import torch
+
+
+class GuidanceScaler(object):
+    """Guidance scaler."""
+
+    def __init__(self, **kwargs):
+        self.guidance_scale = kwargs.get("guidance_scale", 1)
+        self.guidance_trunc = kwargs.get("guidance_trunc", 0)
+        self.guidance_renorm = kwargs.get("guidance_renorm", 1)
+        self.image_guidance_scale = kwargs.get("image_guidance_scale", 0)
+        self.spatiotemporal_guidance_scale = kwargs.get("spatiotemporal_guidance_scale", 0)
+        self.min_guidance_scale = kwargs.get("min_guidance_scale", None) or self.guidance_scale
+        self.inc_guidance_scale = self.guidance_scale - self.min_guidance_scale
+
+    @property
+    def extra_pass(self) -> bool:
+        """Return if an additional (third) guidance pass is required."""
+        return self.image_guidance_scale + self.spatiotemporal_guidance_scale > 0
+
+    def clone(self):
+        """Return a deepcopy of current guidance scaler."""
+        return GuidanceScaler(**self.__dict__)
+
+    def decay_guidance_scale(self, decay=0):
+        """Scale guidance scale according to decay."""
+        self.guidance_scale = self.inc_guidance_scale * decay + self.min_guidance_scale
+
+    def expand(self, x: torch.Tensor, padding: torch.Tensor = None) -> torch.Tensor:
+        """Expand input tensor for guidance passes."""
+        x = torch.stack([x] * (3 if self.extra_pass else 2)) if self.guidance_scale > 1 else x
+        x.__setitem__(1, padding) if self.image_guidance_scale and padding is not None else None
+        return x.flatten(0, 1) if self.guidance_scale > 1 else x
+
+    def expand_text(self, c: torch.Tensor) -> torch.Tensor:
+        """Expand text embedding tensor for guidance passes."""
+        c = list(c.chunk(2)) if self.extra_pass else c
+        c.append(c[1]) if self.image_guidance_scale else None  # Null, Null
+        c.append(c[0]) if self.spatiotemporal_guidance_scale else None  # Null, Text
+        return torch.cat(c) if self.extra_pass else c
+
+    def maybe_disable(self, timestep, *args):
+        """Disable all guidance passes if matching truncation threshold."""
+        if self.guidance_scale > 1 and self.guidance_trunc:
+            if float(timestep) < self.guidance_trunc:
+                self.guidance_scale = 1
+                return [_.chunk(3 if self.extra_pass else 2)[0] for _ in args]
+        return args
+
+    def renorm(self, x, cond):
+        """Apply guidance renormalization to input logits."""
+        if self.guidance_renorm >= 1:
+            return x
+        args = {"dim": tuple(range(1, len(x.shape))), "keepdim": True}
+        return x.mul_(cond.norm(**args).div_(x.norm(**args)).clamp(self.guidance_renorm, 1))
+
+    def scale(self, x: torch.Tensor) -> torch.Tensor:
+        """Apply guidance passes to input logits."""
+        if self.guidance_scale <= 1:
+            return x
+        if self.image_guidance_scale:
+            cond, uncond, imgcond = x.chunk(3)
+            x = self.renorm(uncond.add(cond.sub(imgcond).mul_(self.guidance_scale)), cond)
+            return x.add_(imgcond.sub_(uncond).mul_(self.image_guidance_scale))
+        if self.spatiotemporal_guidance_scale:
+            cond, uncond, perturb = x.chunk(3)
+            x = self.renorm(uncond.add_(cond.sub(uncond).mul_(self.guidance_scale)), cond)
+            return x.add_(cond.sub_(perturb).mul_(self.spatiotemporal_guidance_scale))
+        cond, uncond = x.chunk(2)
+        return self.renorm(uncond.add_(cond.sub(uncond).mul_(self.guidance_scale)), cond)
diff --git a/URSA/diffnext/models/normalization.py b/URSA/diffnext/models/normalization.py
new file mode 100644
index 0000000000000000000000000000000000000000..b9f41b67236a39440923dcb6b38f95b6216ace01
--- /dev/null
+++ b/URSA/diffnext/models/normalization.py
@@ -0,0 +1,62 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Normalization Layers."""
+
+from typing import Tuple
+
+import torch
+from torch import nn
+
+
+class AdaLayerNormZero(nn.Module):
+    """Adaptive LayerNorm with residual stats."""
+
+    def __init__(self, dim, rank=None, num_stats=2, eps=1e-6):
+        super(AdaLayerNormZero, self).__init__()
+        self.lora = nn.Linear(dim, rank, bias=False) if rank else nn.Identity()
+        self.proj = nn.Linear(rank if rank else dim, num_stats * dim)
+        self.norm = nn.LayerNorm(dim, eps, elementwise_affine=False) if eps else nn.Identity()
+        self.activation, self.num_stats = nn.SiLU(), num_stats
+
+    def forward(self, x, z) -> Tuple[torch.Tensor, Tuple[torch.Tensor]]:
+        stats = self.proj(self.lora(self.activation(z))).chunk(self.num_stats, dim=-1)
+        return self.norm(x).mul(1 + stats[0]).add_(stats[1]), stats[2:]
+
+
+class AdaLayerNorm(AdaLayerNormZero):
+    """Adaptive LayerNorm."""
+
+    def __init__(self, dim, rank=None, eps=1e-6):
+        super(AdaLayerNorm, self).__init__(dim, rank, num_stats=2, eps=eps)
+
+    def forward(self, x, z) -> torch.Tensor:
+        return super().forward(x, z)[0]
+
+
+class AdaLayerNormSingle(nn.Module):
+    """Adaptive LayerNorm with shared residual stats."""
+
+    def __init__(self, dim, num_stats=2, eps=1e-6):
+        super(AdaLayerNormSingle, self).__init__()
+        self.bias = nn.Parameter(torch.randn(num_stats, dim) / dim**0.5)
+        self.norm = nn.LayerNorm(dim, eps, elementwise_affine=False) if eps else nn.Identity()
+        self.num_stats = num_stats
+
+    def forward(self, x, z) -> Tuple[torch.Tensor, Tuple[torch.Tensor]]:
+        axis = -2 if z.size(-1) == self.bias.size(-1) else -1
+        bias = self.bias.flatten(-1 if z.size(-1) == self.bias.size(-1) else 0)
+        stats = z.add(bias).chunk(self.num_stats, dim=axis)
+        return self.norm(x).mul(1 + stats[0]).add_(stats[1]), stats[2:]
diff --git a/URSA/diffnext/models/vision_transformer.py b/URSA/diffnext/models/vision_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..2aff695100b87181fe59b49b6f118aaf68a199a6
--- /dev/null
+++ b/URSA/diffnext/models/vision_transformer.py
@@ -0,0 +1,146 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Vision Transformer."""
+
+from typing import Tuple
+
+import torch
+from torch import nn
+from torch.utils.checkpoint import checkpoint as apply_ckpt
+
+from diffnext.models.embeddings import PatchEmbed, RotaryEmbed3D
+from diffnext.models.flex_attention import FlexAttentionCausal2D
+
+
+class MLP(nn.Module):
+    """Two layers MLP."""
+
+    def __init__(self, dim, mlp_ratio=4):
+        super(MLP, self).__init__()
+        self.fc1 = nn.Linear(dim, int(dim * mlp_ratio))
+        self.fc2 = nn.Linear(int(dim * mlp_ratio), dim)
+        self.activation = nn.GELU()
+
+    def forward(self, x) -> torch.Tensor:
+        return self.fc2(self.activation(self.fc1(x)))
+
+
+class Attention(nn.Module):
+    """Multihead attention."""
+
+    def __init__(self, dim, num_heads, qkv_bias=True):
+        super(Attention, self).__init__()
+        self.num_heads, self.head_dim = num_heads, dim // num_heads
+        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
+        self.proj = nn.Linear(dim, dim)
+        self.attn_mask, self.cache_kv, self.pe_func, self.flex_attn = None, None, None, None
+
+    def forward(self, x) -> torch.Tensor:
+        qkv_shape = [-1, x.size(1), 3, self.num_heads, self.head_dim]
+        q, k, v = self.qkv(x).view(qkv_shape).permute(2, 0, 3, 1, 4).unbind(dim=0)
+        q, k = (self.pe_func(q), self.pe_func(k)) if self.pe_func else (q, k)
+        if self.cache_kv is not None and self.cache_kv:
+            if isinstance(self.cache_kv, list):
+                k = self.cache_kv[0] = torch.cat([self.cache_kv[0], k], dim=2)
+                v = self.cache_kv[1] = torch.cat([self.cache_kv[1], v], dim=2)
+            else:
+                self.cache_kv = [k, v]
+        if self.flex_attn and self.flex_attn.offsets:
+            return self.proj(self.flex_attn(q, k, v).transpose(1, 2).flatten(2))
+        o = nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=self.attn_mask)
+        return self.proj(o.transpose(1, 2).flatten(2))
+
+
+class Block(nn.Module):
+    """Transformer block."""
+
+    def __init__(self, dim, num_heads, mlp_ratio=4, qkv_bias=True):
+        super(Block, self).__init__()
+        self.norm1 = nn.LayerNorm(dim)
+        self.attn = Attention(dim, num_heads, qkv_bias=qkv_bias)
+        self.norm2 = nn.LayerNorm(dim)
+        self.mlp = MLP(dim, mlp_ratio=mlp_ratio)
+        self.attn_checkpointing, self.mlp_checkpointing = False, False
+
+    def forward_attn(self, x) -> torch.Tensor:
+        return self.norm1(self.attn(x))
+
+    def forward_mlp(self, x) -> torch.Tensor:
+        return self.norm2(self.mlp(x))
+
+    def forward_ckpt(self, x, name) -> torch.Tensor:
+        if getattr(self, f"{name}_checkpointing", False) and x.requires_grad:
+            return apply_ckpt(getattr(self, f"forward_{name}"), x, use_reentrant=False)
+        return getattr(self, f"forward_{name}")(x)
+
+    def forward(self, x, pe_func: callable = None) -> torch.Tensor:
+        self.attn.pe_func = pe_func
+        x = self.forward_ckpt(x, "attn").add_(x)
+        return self.forward_ckpt(x, "mlp").add_(x)
+
+
+class VisionTransformer(nn.Module):
+    """Vision transformer."""
+
+    def __init__(
+        self,
+        depth,
+        embed_dim,
+        num_heads,
+        mlp_ratio=4,
+        patch_size=2,
+        image_size=32,
+        image_dim=4,
+        encoder_depth=None,
+    ):
+        super(VisionTransformer, self).__init__()
+        self.embed_dim, self.image_size, self.image_dim = embed_dim, image_size, image_dim
+        self.patch_embed = PatchEmbed(image_dim, embed_dim, patch_size)
+        self.pos_embed, self.rope = nn.Identity(), RotaryEmbed3D(embed_dim // num_heads)
+        self.blocks = nn.ModuleList(Block(embed_dim, num_heads, mlp_ratio) for _ in range(depth))
+        self.norm, self.mixer = nn.LayerNorm(embed_dim), nn.Identity()
+        self.encoder_depth = len(self.blocks) // 2 if encoder_depth is None else encoder_depth
+        self.flex_attn = FlexAttentionCausal2D()
+        [setattr(blk.attn, "flex_attn", self.flex_attn) for blk in self.blocks]
+
+    def prepare_pe(self, c=None, ids=None, pos=None) -> Tuple[callable, callable]:
+        pad = 0 if c is None else c.size(1)
+        pe1 = pe2 = self.rope.get_func(pos, pad)
+        pe1 = self.rope.get_func(pos, pad, ids.expand(-1, -1, 3)) if ids is not None else pe1
+        return pe1, pe2
+
+    def enable_kvcache(self, mode=True):
+        [setattr(blk.attn, "cache_kv", mode) for blk in self.blocks]
+
+    def forward(self, x, c=None, prev_ids=None, pos=None) -> torch.Tensor:
+        x, prev_ids = x if isinstance(x, (tuple, list)) else (x, prev_ids)
+        prev_ids = prev_ids if self.encoder_depth else None
+        x = x_masked = self.pos_embed(self.patch_embed(x))
+        pe1, pe2 = self.prepare_pe(c, prev_ids, pos) if pos is not None else [None] * 2
+        if prev_ids is not None:  # Split mask from x.
+            prev_ids = prev_ids.expand(-1, -1, x.size(-1))
+            x = x.gather(1, prev_ids)
+        x = x if c is None else torch.cat([c, x], dim=1)
+        for blk in self.blocks[: self.encoder_depth]:
+            x = blk(x, pe1)
+        if prev_ids is not None and c is not None:  # Split c from x.
+            c, x = x.split((c.size(1), x.size(1) - c.size(1)), dim=1)
+        if prev_ids is not None:  # Merge mask with x.
+            x = x_masked.to(dtype=x.dtype).scatter(1, prev_ids, x)
+            x = x if c is None else torch.cat([c, x], dim=1)
+        for blk in self.blocks[self.encoder_depth :]:
+            x = blk(x, pe2)
+        return self.norm(x if c is None else x[:, c.size(1) :])
diff --git a/URSA/diffnext/schedulers/__init__.py b/URSA/diffnext/schedulers/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..7baa8d75fd88dcdfb923dea7e87df477739d27cf
--- /dev/null
+++ b/URSA/diffnext/schedulers/__init__.py
@@ -0,0 +1,16 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Schedulers."""
diff --git a/URSA/diffnext/schedulers/scheduling_cfm.py b/URSA/diffnext/schedulers/scheduling_cfm.py
new file mode 100644
index 0000000000000000000000000000000000000000..47349737ae02a8798417a6acc6443205c8e46e3d
--- /dev/null
+++ b/URSA/diffnext/schedulers/scheduling_cfm.py
@@ -0,0 +1,140 @@
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+##############################################################################
+"""Simple implementation of continuous flow matching schedulers."""
+
+import dataclasses
+import math
+
+import numpy as np
+import torch
+
+from diffusers.configuration_utils import ConfigMixin, register_to_config
+from diffusers.models.modeling_outputs import BaseOutput
+from diffusers.schedulers.scheduling_utils import SchedulerMixin
+
+
+@dataclasses.dataclass
+class FlowMatchEulerDiscreteSchedulerOutput(BaseOutput):
+    """Output for scheduler's `step` function output."""
+
+    prev_sample: torch.FloatTensor
+
+
+class FlowMatchEulerDiscreteScheduler(SchedulerMixin, ConfigMixin):
+
+    order = 1
+
+    @register_to_config
+    def __init__(self, num_train_timesteps=1000, shift=1.0, use_dynamic_shifting=False):
+        timesteps = np.arange(1, num_train_timesteps + 1, dtype="float32")[::-1]
+        sigmas, self._shift = timesteps / num_train_timesteps, shift
+        if not use_dynamic_shifting:
+            sigmas = shift * sigmas / (1 + (shift - 1) * sigmas)
+        self.timesteps = torch.as_tensor(sigmas * num_train_timesteps)
+        self.sigmas = torch.as_tensor(sigmas)
+        self.sigma_min, self.sigma_max = float(sigmas[-1]), float(sigmas[0])
+        self.timestep = self.sigma = None  # Training states.
+        self._begin_index = self._step_index = None  # Inference counters.
+
+    @property
+    def shift(self):
+        """The value used for shifting."""
+        return self._shift
+
+    @property
+    def step_index(self):
+        """The index counter for current timestep."""
+        return self._step_index
+
+    @property
+    def begin_index(self):
+        """The index for the first timestep."""
+        return self._begin_index
+
+    def _sigma_to_t(self, sigma):
+        return sigma * self.config.num_train_timesteps
+
+    def _init_step_index(self, timestep):
+        if self.begin_index is None:
+            self._step_index = self.index_for_timestep(timestep)
+        else:
+            self._step_index = self._begin_index
+
+    def time_shift(self, mu: float, sigma: float, t: torch.Tensor):
+        return math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma)
+
+    def set_shift(self, shift: float):
+        self._shift = shift
+
+    def index_for_timestep(self, timestep, schedule_timesteps=None):
+        if schedule_timesteps is None:
+            schedule_timesteps = self.timesteps
+        indices = (schedule_timesteps == timestep).nonzero()
+        return indices[1 if len(indices) > 1 else 0].item()
+
+    def sample_timesteps(self, size, device=None):
+        """Sample the discrete timesteps used for training."""
+        dist = torch.normal(0, 1, size, device=device).sigmoid_()
+        return dist.mul_(self.config.num_train_timesteps).to(dtype=torch.int64)
+
+    def set_timesteps(self, num_inference_steps, mu=None):
+        """Sets the discrete timesteps used for the diffusion chain."""
+        self.num_inference_steps = num_inference_steps
+        t_max, t_min = self._sigma_to_t(self.sigma_max), self._sigma_to_t(self.sigma_min)
+        timesteps = np.linspace(t_max, t_min, num_inference_steps, dtype="float32")
+        sigmas = timesteps / self.config.num_train_timesteps
+        if self.config.use_dynamic_shifting:
+            sigmas = self.time_shift(mu, 1.0, sigmas)
+        else:
+            sigmas = self.shift * sigmas / (1 + (self.shift - 1) * sigmas)
+        self.sigmas = sigmas.tolist() + [0]
+        self.timesteps = sigmas * self.config.num_train_timesteps
+        self._begin_index = self._step_index = None
+
+    def add_noise(
+        self,
+        original_samples: torch.Tensor,
+        noise: torch.Tensor,
+        timesteps: torch.Tensor,
+    ):
+        """Add forward noise to samples for training."""
+        dtype, device = original_samples.dtype, original_samples.device
+        self.timestep = self.timesteps.to(device=device)[timesteps]
+        self.sigma = self.sigmas.to(device=device, dtype=dtype)[timesteps]
+        self.sigma = self.sigma.view(timesteps.shape + (1,) * (noise.dim() - timesteps.dim()))
+        return self.sigma * noise + (1.0 - self.sigma) * original_samples
+
+    def scale_noise(self, sample: torch.Tensor, timestep: float, noise: torch.Tensor):
+        """Add forward noise to samples for inference."""
+        self._init_step_index(timestep) if self.step_index is None else None
+        sigma = self.sigmas[self.step_index]
+        return sigma * noise + (1.0 - sigma) * sample
+
+    def step(
+        self,
+        model_output: torch.Tensor,
+        timestep: float,
+        sample: torch.FloatTensor,
+        generator: torch.Generator = None,
+        return_dict=True,
+    ):
+        """Predict the sample from the previous timestep."""
+        self._init_step_index(timestep) if self.step_index is None else None
+        dt = self.sigmas[self.step_index + 1] - self.sigmas[self.step_index]
+        prev_sample = model_output.mul(dt).add_(sample)
+        self._step_index += 1
+        if not return_dict:
+            return (prev_sample,)
+        return FlowMatchEulerDiscreteSchedulerOutput(prev_sample=prev_sample)
diff --git a/URSA/diffnext/schedulers/scheduling_ddpm.py b/URSA/diffnext/schedulers/scheduling_ddpm.py
new file mode 100644
index 0000000000000000000000000000000000000000..aff3f5dc18c05502fded03708c89120167c21b08
--- /dev/null
+++ b/URSA/diffnext/schedulers/scheduling_ddpm.py
@@ -0,0 +1,354 @@
+# Copyright 2024 UC Berkeley Team and The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# DISCLAIMER: This file is strongly influenced by https://github.com/ermongroup/ddim
+
+import math
+from dataclasses import dataclass
+from typing import List, Optional, Tuple, Union
+
+import numpy as np
+import torch
+
+from diffusers.configuration_utils import ConfigMixin, register_to_config
+from diffusers.models.modeling_outputs import BaseOutput
+from diffusers.utils.torch_utils import randn_tensor
+from diffusers.schedulers.scheduling_utils import KarrasDiffusionSchedulers, SchedulerMixin
+
+
+@dataclass
+class DDPMSchedulerOutput(BaseOutput):
+    """Output class for the scheduler's `step` function output."""
+
+    prev_sample: torch.Tensor
+    pred_original_sample: Optional[torch.Tensor] = None
+
+
+def betas_for_alpha_bar(num_diffusion_timesteps, max_beta=0.999, alpha_transform_type="cosine"):
+    """Create a beta schedule that discretizes the given alpha_t_bar function."""
+    if alpha_transform_type == "cosine":
+        alpha_bar_fn = lambda t: math.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2  # noqa
+    elif alpha_transform_type == "exp":
+        alpha_bar_fn = lambda t: math.exp(t * -12.0)  # noqa
+    else:
+        raise ValueError(f"Unsupported alpha_transform_type: {alpha_transform_type}")
+    betas = []
+    for i in range(num_diffusion_timesteps):
+        t1 = i / num_diffusion_timesteps
+        t2 = (i + 1) / num_diffusion_timesteps
+        betas.append(min(1 - alpha_bar_fn(t2) / alpha_bar_fn(t1), max_beta))
+    return torch.tensor(betas, dtype=torch.float32)
+
+
+def rescale_zero_terminal_snr(betas):
+    """Rescales betas to have zero terminal SNR."""
+    # Convert betas to alphas_bar_sqrt
+    alphas = 1.0 - betas
+    alphas_cumprod = torch.cumprod(alphas, dim=0)
+    alphas_bar_sqrt = alphas_cumprod.sqrt()
+    # Store old values.
+    alphas_bar_sqrt_0 = alphas_bar_sqrt[0].clone()
+    alphas_bar_sqrt_T = alphas_bar_sqrt[-1].clone()
+    # Shift so the last timestep is zero.
+    alphas_bar_sqrt -= alphas_bar_sqrt_T
+    # Scale so the first timestep is back to the old value.
+    alphas_bar_sqrt *= alphas_bar_sqrt_0 / (alphas_bar_sqrt_0 - alphas_bar_sqrt_T)
+    # Convert alphas_bar_sqrt to betas
+    alphas_bar = alphas_bar_sqrt**2  # Revert sqrt
+    alphas = alphas_bar[1:] / alphas_bar[:-1]  # Revert cumprod
+    alphas = torch.cat([alphas_bar[0:1], alphas])
+    betas = 1 - alphas
+    return betas
+
+
+class DDPMScheduler(SchedulerMixin, ConfigMixin):
+    """
+    `DDPMScheduler` explores the connections between denoising score matching and Langevin dynamics sampling.
+
+    This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
+    methods the library implements for all schedulers such as loading and saving.
+
+    Args:
+        num_train_timesteps (`int`, defaults to 1000):
+            The number of diffusion steps to train the model.
+        beta_start (`float`, defaults to 0.0001):
+            The starting `beta` value of inference.
+        beta_end (`float`, defaults to 0.02):
+            The final `beta` value.
+        beta_schedule (`str`, defaults to `"linear"`):
+            The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from
+            `linear`, `scaled_linear`, or `squaredcos_cap_v2`.
+        trained_betas (`np.ndarray`, *optional*):
+            An array of betas to pass directly to the constructor without using `beta_start` and `beta_end`.
+        variance_type (`str`, defaults to `"fixed_small"`):
+            Clip the variance when adding noise to the denoised sample. Choose from `fixed_small`, `fixed_small_log`,
+            `fixed_large`, `fixed_large_log`, `learned` or `learned_range`.
+        clip_sample (`bool`, defaults to `True`):
+            Clip the predicted sample for numerical stability.
+        clip_sample_range (`float`, defaults to 1.0):
+            The maximum magnitude for sample clipping. Valid only when `clip_sample=True`.
+        prediction_type (`str`, defaults to `epsilon`, *optional*):
+            Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process),
+            `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen
+            Video](https://imagen.research.google/video/paper.pdf) paper).
+        thresholding (`bool`, defaults to `False`):
+            Whether to use the "dynamic thresholding" method. This is unsuitable for latent-space diffusion models such
+            as Stable Diffusion.
+        dynamic_thresholding_ratio (`float`, defaults to 0.995):
+            The ratio for the dynamic thresholding method. Valid only when `thresholding=True`.
+        sample_max_value (`float`, defaults to 1.0):
+            The threshold value for dynamic thresholding. Valid only when `thresholding=True`.
+        timestep_spacing (`str`, defaults to `"leading"`):
+            The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
+            Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
+        steps_offset (`int`, defaults to 0):
+            An offset added to the inference steps, as required by some model families.
+        rescale_betas_zero_snr (`bool`, defaults to `False`):
+            Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and
+            dark samples instead of limiting it to samples with medium brightness. Loosely related to
+            [`--offset_noise`](https://github.com/huggingface/diffusers/blob/74fd735eb073eb1d774b1ab4154a0876eb82f055/examples/dreambooth/train_dreambooth.py#L506).
+    """  # noqa
+
+    _compatibles = [e.name for e in KarrasDiffusionSchedulers]
+    order = 1
+
+    @register_to_config
+    def __init__(
+        self,
+        num_train_timesteps: int = 1000,
+        beta_start: float = 0.0001,
+        beta_end: float = 0.02,
+        beta_schedule: str = "linear",
+        trained_betas: Optional[Union[np.ndarray, List[float]]] = None,
+        variance_type: str = "fixed_small",
+        clip_sample: bool = True,
+        prediction_type: str = "epsilon",
+        thresholding: bool = False,
+        dynamic_thresholding_ratio: float = 0.995,
+        clip_sample_range: float = 1.0,
+        sample_max_value: float = 1.0,
+        timestep_spacing: str = "leading",
+        steps_offset: int = 0,
+        rescale_betas_zero_snr: int = False,
+    ):
+        if trained_betas is not None:
+            self.betas = torch.tensor(trained_betas, dtype=torch.float32)
+        elif beta_schedule == "linear":
+            self.betas = torch.linspace(
+                beta_start, beta_end, num_train_timesteps, dtype=torch.float32
+            )
+        elif beta_schedule == "scaled_linear":
+            a, b = beta_start**0.5, beta_end**0.5
+            self.betas = torch.linspace(a, b, num_train_timesteps, dtype=torch.float32) ** 2
+        elif beta_schedule == "squaredcos_cap_v2":  # Glide cosine schedule
+            self.betas = betas_for_alpha_bar(num_train_timesteps)
+        elif beta_schedule == "sigmoid":  # GeoDiff sigmoid schedule
+            betas = torch.linspace(-6, 6, num_train_timesteps)
+            self.betas = torch.sigmoid(betas) * (beta_end - beta_start) + beta_start
+        else:
+            raise NotImplementedError(f"{beta_schedule} is not implemented for {self.__class__}")
+        # Rescale for zero SNR
+        if rescale_betas_zero_snr:
+            self.betas = rescale_zero_terminal_snr(self.betas)
+        self.alphas = 1.0 - self.betas
+        self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)
+        self.one = torch.tensor(1.0)
+        self.init_noise_sigma = 1.0
+        self.custom_timesteps = False
+        self.num_inference_steps = None
+        self.timesteps = torch.from_numpy(np.arange(num_train_timesteps)[::-1].copy())
+        self.variance_type = variance_type
+
+    def scale_model_input(
+        self, sample: torch.Tensor, timestep: Optional[int] = None
+    ) -> torch.Tensor:
+        """Scale the denoising model input depending on the current timestep."""
+        return sample
+
+    def sample_timesteps(self, size, device=None):
+        return torch.randint(0, self.config.num_train_timesteps, size, device=device)
+
+    def set_timesteps(
+        self,
+        num_inference_steps: Optional[int] = None,
+        device: Union[str, torch.device] = None,
+        timesteps: Optional[List[int]] = None,
+    ):
+        """Sets the discrete timesteps used for the diffusion chain (to be run before inference)."""
+        if num_inference_steps is not None and timesteps is not None:
+            raise ValueError("Can only pass one of `num_inference_steps` or `custom_timesteps`.")
+        self.custom_timesteps = timesteps is not None
+        self.num_inference_steps = num_inference_steps
+        if timesteps is not None:
+            timesteps = np.array(timesteps, dtype=np.int64)
+        # See Table 2. of https://arxiv.org/abs/2305.08891
+        elif self.config.timestep_spacing == "linspace":
+            timesteps = np.linspace(0, self.config.num_train_timesteps - 1, num_inference_steps)
+            timesteps = timesteps.round()[::-1].copy().astype(np.int64)
+        elif self.config.timestep_spacing == "leading":
+            step_ratio = self.config.num_train_timesteps // self.num_inference_steps
+            timesteps = np.arange(0, num_inference_steps) * step_ratio
+            timesteps = timesteps.round()[::-1].copy().astype(np.int64) + self.config.steps_offset
+        elif self.config.timestep_spacing == "trailing":
+            step_ratio = self.config.num_train_timesteps / self.num_inference_steps
+            timesteps = np.arange(self.config.num_train_timesteps, 0, -step_ratio)
+            timesteps = timesteps.round().astype(np.int64) - 1
+        else:
+            raise ValueError(f"{self.config.timestep_spacing} is not supported.")
+        self.timesteps = torch.as_tensor(timesteps, device=device)
+
+    def _get_variance(self, t, predicted_variance=None):
+        prev_t = self.previous_timestep(t)
+        alpha_prod_t = self.alphas_cumprod[t]
+        alpha_prod_t_prev = self.alphas_cumprod[prev_t] if prev_t >= 0 else self.one
+        current_beta_t = 1 - alpha_prod_t / alpha_prod_t_prev
+        # For t > 0, compute predicted variance βt (see formula (6) and (7) from https://arxiv.org/pdf/2006.11239.pdf)  # noqa
+        # and sample from it to get previous sample
+        # x_{t-1} ~ N(pred_prev_sample, variance) == add variance to pred_sample
+        variance = (1 - alpha_prod_t_prev) / (1 - alpha_prod_t) * current_beta_t
+        # we always take the log of variance, so clamp it to ensure it's not 0
+        variance = torch.clamp(variance, min=1e-20)
+        if self.config.variance_type == "fixed_small_log":  # for rl-diffuser
+            return torch.exp(0.5 * variance.log())
+        elif self.config.variance_type == "fixed_large":
+            return current_beta_t
+        elif self.config.variance_type == "fixed_large_log":  # Glide max_log
+            return torch.log(current_beta_t)
+        elif self.config.variance_type == "learned":
+            return predicted_variance
+        elif self.config.variance_type == "learned_range":
+            frac = (predicted_variance + 1) / 2
+            min_log, max_log = variance.log(), torch.log(current_beta_t)
+            return frac * max_log + (1 - frac) * min_log
+        return variance
+
+    def step(
+        self,
+        model_output: torch.Tensor,
+        timestep: int,
+        sample: torch.Tensor,
+        generator=None,
+        return_dict: bool = True,
+    ) -> Union[DDPMSchedulerOutput, Tuple]:
+        """
+        Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
+        process from the learned model outputs (most often the predicted noise).
+
+        Args:
+            model_output (`torch.Tensor`):
+                The direct output from learned diffusion model.
+            timestep (`float`):
+                The current discrete timestep in the diffusion chain.
+            sample (`torch.Tensor`):
+                A current instance of a sample created by the diffusion process.
+            generator (`torch.Generator`, *optional*):
+                A random number generator.
+            return_dict (`bool`, *optional*, defaults to `True`):
+                Whether or not to return a [`~schedulers.scheduling_ddpm.DDPMSchedulerOutput`] or `tuple`.
+
+        Returns:
+            [`~schedulers.scheduling_ddpm.DDPMSchedulerOutput`] or `tuple`:
+                If return_dict is `True`, [`~schedulers.scheduling_ddpm.DDPMSchedulerOutput`] is returned, otherwise a
+                tuple is returned where the first element is the sample tensor.
+
+        """  # noqa
+        t = timestep
+        prev_t = self.previous_timestep(t)
+
+        predicted_variance = None
+        if self.variance_type in ("learned", "learned_range"):
+            if model_output.shape[1] == sample.shape[1] * 2:
+                model_output, predicted_variance = model_output.chunk(2, dim=1)
+
+        # 1. compute alphas, betas
+        alpha_prod_t = self.alphas_cumprod[t]
+        alpha_prod_t_prev = self.alphas_cumprod[prev_t] if prev_t >= 0 else self.one
+        beta_prod_t = 1 - alpha_prod_t
+        beta_prod_t_prev = 1 - alpha_prod_t_prev
+        current_alpha_t = alpha_prod_t / alpha_prod_t_prev
+        current_beta_t = 1 - current_alpha_t
+
+        # 2. compute predicted original sample from predicted noise also called
+        # "predicted x_0" of formula (15) from https://arxiv.org/pdf/2006.11239.pdf
+        if self.config.prediction_type == "epsilon":
+            pred_sample = (sample - beta_prod_t**0.5 * model_output) / alpha_prod_t**0.5
+        elif self.config.prediction_type == "sample":
+            pred_sample = model_output
+        elif self.config.prediction_type == "v_prediction":
+            pred_sample = alpha_prod_t**0.5 * sample - beta_prod_t**0.5 * model_output
+        else:
+            raise ValueError(f"Unsupported prediction type given as {self.config.prediction_type}.")
+
+        # 4. Compute coefficients for pred_sample x_0 and current sample x_t
+        # See formula (7) from https://arxiv.org/pdf/2006.11239.pdf
+        pred_sample_coeff = alpha_prod_t_prev**0.5 * current_beta_t / beta_prod_t
+        current_sample_coeff = current_alpha_t**0.5 * beta_prod_t_prev / beta_prod_t
+
+        # 5. Compute predicted previous sample µ_t
+        # See formula (7) from https://arxiv.org/pdf/2006.11239.pdf
+        prev_sample = pred_sample_coeff * pred_sample + current_sample_coeff * sample
+
+        # 6. Add noise
+        if t > 0:
+            device, dtype = model_output.device, model_output.dtype
+            noise = randn_tensor(sample.shape, generator=generator, device=device, dtype=dtype)
+            if self.variance_type == "fixed_small_log":
+                variance = self._get_variance(t, predicted_variance)
+            elif self.variance_type == "learned_range":
+                variance = self._get_variance(t, predicted_variance).mul(0.5).exp()
+            else:
+                variance = self._get_variance(t, predicted_variance) ** 0.5
+            prev_sample.add_(noise.mul_(variance))
+
+        if not return_dict:
+            return (prev_sample,)
+        return DDPMSchedulerOutput(prev_sample=prev_sample)
+
+    def previous_timestep(self, timestep):
+        if self.custom_timesteps:
+            index = (self.timesteps == timestep).nonzero(as_tuple=True)[0][0]
+            if index == self.timesteps.shape[0] - 1:
+                return torch.tensor(-1)
+            return self.timesteps[index + 1]
+        num_inference_steps = self.num_inference_steps or self.config.num_train_timesteps
+        return timestep - self.config.num_train_timesteps // num_inference_steps
+
+    def add_noise(
+        self, original_samples: torch.Tensor, noise: torch.Tensor, timesteps: torch.Tensor
+    ) -> torch.Tensor:
+        timesteps = timesteps.to(device=original_samples.device)
+        self.alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device)
+        alphas_cumprod = self.alphas_cumprod.to(dtype=original_samples.dtype)
+        sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5
+        sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5
+        expand_shape = timesteps.shape + (1,) * (noise.dim() - timesteps.dim())
+        sqrt_alpha_prod = sqrt_alpha_prod.view(expand_shape)
+        sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.view(expand_shape)
+        return sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise
+
+    def get_velocity(
+        self, sample: torch.Tensor, noise: torch.Tensor, timesteps: torch.Tensor
+    ) -> torch.Tensor:
+        timesteps = timesteps.to(sample.device)
+        self.alphas_cumprod = self.alphas_cumprod.to(device=sample.device)
+        alphas_cumprod = self.alphas_cumprod.to(dtype=sample.dtype)
+        sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5
+        sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5
+        expand_shape = timesteps.shape + (1,) * (noise.dim() - timesteps.dim())
+        sqrt_alpha_prod = sqrt_alpha_prod.view(expand_shape)
+        sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.view(expand_shape)
+        return sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * sample
+
+    def __len__(self):
+        return self.config.num_train_timesteps
diff --git a/URSA/diffnext/schedulers/scheduling_dfm.py b/URSA/diffnext/schedulers/scheduling_dfm.py
new file mode 100644
index 0000000000000000000000000000000000000000..6bb49b96c938e10a4f72c7b677fc131a655f69ff
--- /dev/null
+++ b/URSA/diffnext/schedulers/scheduling_dfm.py
@@ -0,0 +1,347 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, esither express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Simple implementation of discrete flow matching schedulers."""
+
+import dataclasses
+import os
+from typing import Union
+from typing_extensions import Self
+
+from diffusers.configuration_utils import ConfigMixin, register_to_config
+from diffusers.models.modeling_outputs import BaseOutput
+from diffusers.schedulers.scheduling_utils import SchedulerMixin
+import torch
+
+
+@dataclasses.dataclass
+class KineticOptimalSchedulerOutput(BaseOutput):
+    """Output for scheduler's `step` function output."""
+
+    prev_sample: torch.LongTensor
+
+
+class DiscreteProbPath(object):
+    """Define a general discrete probability path."""
+
+    def __init__(self, emb):
+        """Create a ``DiscreteProbPath``.
+
+        Args:
+            emb (Union[torch.Tensor, torch.nn.Embedding])
+                The codebook embeddings.
+        """
+        self.generator = None
+        self.emb = emb.weight if isinstance(emb, torch.nn.Embedding) else emb
+
+    def categorical(self, prob) -> torch.Tensor:
+        """Categorical sampling according to weights in the last dimension.
+
+        Args:
+            prob (torch.Tensor)
+                The sample token probability, shape (bsz, ..., codebook_size).
+
+        Returns:
+            torch.Tensor: The sample token index, shape (bsz, ...).
+        """
+        return prob.flatten(0, -2).multinomial(1, generator=self.generator).view(*prob.shape[:-1])
+
+
+class MixtureDiscreteProbPath(DiscreteProbPath):
+    """Define a mixture discrete probability path."""
+
+    def sample(self, x_1, t: Union[float, torch.Tensor]) -> torch.Tensor:
+        """Sample from the affine probability path.
+
+        Args:
+            x_1 (torch.Tensor)
+                The target token index, shape (bsz, ...).
+            t (float or torch.Tensor)
+                The timestep ``t``, shape (bsz,).
+
+        Returns:
+            torch.Tensor: The sample token index at time t, shape (bsz, ...).
+        """
+        t = t.to(self.emb).view([-1] + [1] * (x_1.dim() - 1)) if hasattr(t, "cpu") else t
+        x_0 = x_1.new_empty(x_1.shape).random_(to=self.emb.shape[0], generator=self.generator)
+        return x_0.where(t.new_empty(x_1.shape).uniform_(generator=self.generator).lt(1 - t), x_1)
+
+    def get_velocity(self, logits, x_t, t: float, x_1=None) -> torch.Tensor:
+        """Return the velocity by converting the factorized posterior.
+
+        Args:
+            logits (torch.Tensor)
+                The sample token logits at time t+1, shape (bsz, ..., codebook_size).
+            x_t (torch.Tensor)
+                The sample token index at time t, shape (bsz, ...).
+            t (float)
+                The timestep ``t``.
+            x_1 (torch.Tensor, optional):
+                The sample token index at time t+1, shape (bsz, ...).
+
+        Returns:
+            torch.Tensor: The velocity ``v``.
+        """
+        x_1 = self.categorical(logits.softmax(-1)) if x_1 is None else x_1
+        return logits.zero_().scatter_(-1, x_1.unsqueeze(-1), 1 / (1 - t))
+
+
+class MetricDiscreteProbPath(DiscreteProbPath):
+    """Define a metric-induced discrete probability path."""
+
+    def __init__(self, emb, alpha=0.9, c=3, eps=1e-5):
+        """Create a ``MetricDiscreteProbPath``.
+
+        Args:
+            emb (Union[torch.Tensor, torch.nn.Embedding])
+                The codebook embeddings.
+            alpha (float)
+                The value to ``alpha``.
+            c (float)
+                The value to ``c``.
+            eps (float, *optional*, defaults to 1e-5):
+                A small value to clip the L2 normalization denominator.
+        """
+        self.alpha, self.c, self.eps, self.generator = alpha, c, eps, None
+        emb = emb.weight if isinstance(emb, torch.nn.Embedding) else emb
+        self.emb = torch.nn.functional.normalize(emb, dim=-1, eps=eps)
+        self.emb_sumsq = self.emb.square().sum(-1)
+        self.emb_mul2t = self.emb.mul(2).T.contiguous()
+
+    def get_dist(self, emb_1: torch.Tensor, emb_2: torch.Tensor = None) -> torch.Tensor:
+        """Return the distance between two input embeddings.
+
+        Args:
+            emb_1 (torch.Tensor)
+                The input1 embeddings, shape (bsz, ..., dim).
+            emb_2 (torch.Tensor, optional)
+                The input2 embeddings, shape (bsz, ..., dim) or (bsz, ..., codebook_size).
+
+        Returns:
+            torch.Tensor: The distance, shape (bsz, ..., 1) or (bsz, ..., codebook_size).
+        """
+        emb_1 = torch.nn.functional.normalize(emb_1, dim=-1, eps=self.eps)
+        if emb_2 is None or emb_1.size() != emb_2.size():  # Distance between input and codebook.
+            emb_1_sumsq, emb_2_sumsq = emb_1.square().sum(-1, True), self.emb_sumsq
+            return torch.add(emb_1_sumsq, emb_2_sumsq, out=emb_2).sub_(emb_1 @ self.emb_mul2t)
+        emb_2 = torch.nn.functional.normalize(emb_2, dim=-1, eps=self.eps)
+        return emb_1.sub(emb_2).abs_().square_().sum(-1, keepdim=True)
+
+    def get_prob(self, emb: torch.Tensor, t: Union[float, torch.Tensor]) -> torch.Tensor:
+        """Return the metric-induced probability.
+
+        Args:
+            emb (torch.Tensor)
+                The input embeddings, shape (bsz, ..., dim).
+            t (float or torch.Tensor)
+                The timestep ``t``, shape (bsz,).
+
+        Returns:
+            torch.Tensor: The probability at timestep ``t``, shape (bsz, ..., codebook_size).
+        """
+        beta = self.c * (t / (1 - t)) ** self.alpha
+        beta = beta.to(emb).view([-1] + [1] * (emb.dim() - 1)) if hasattr(t, "cpu") else beta
+        return self.get_dist(emb).mul_(-beta).softmax(-1)
+
+    def get_prob_by_dist(self, dist: torch.Tensor, t: Union[float, torch.Tensor]) -> torch.Tensor:
+        """Return the metric-induced probability by distance.
+
+        Args:
+            dist (torch.Tensor)
+                The distance, shape (bsz, ..., codebook_size).
+            t (float or torch.Tensor)
+                The timestep ``t``, shape (bsz,).
+
+        Returns:
+            torch.Tensor: The probability at timestep ``t``, shape (bsz, ..., codebook_size).
+        """
+        beta = self.c * (t / (1 - t)) ** self.alpha
+        beta = beta.to(dist).view([-1] + [1] * (dist.dim() - 1)) if hasattr(t, "cpu") else beta
+        return dist.mul(-beta).softmax(-1)
+
+    def sample(self, x_1, t: Union[float, torch.Tensor]) -> torch.Tensor:
+        """Sample from the affine probability path.
+
+        Args:
+            x_1 (torch.Tensor)
+                The target token index, shape (bsz, ...).
+            t (float or torch.Tensor)
+                The timestep ``t``, shape (bsz,).
+
+        Returns:
+            torch.Tensor: The sample token index at time t, shape (bsz, ...).
+        """
+        return self.categorical(self.get_prob(self.emb[x_1], t))
+
+    def get_velocity(self, logits, x_t, t: float, x_1=None) -> torch.Tensor:
+        """Return the velocity by converting the factorized posterior.
+
+        Args:
+            logits (torch.Tensor)
+                The sample token logits at time t+1, shape (bsz, ..., codebook_size).
+            x_t (torch.Tensor)
+                The sample token index at time t, shape (bsz, ...).
+            t (float)
+                The timestep ``t``.
+            x_1 (torch.Tensor, optional):
+                The sample token index at time t+1, shape (bsz, ...).
+
+        Returns:
+            torch.Tensor: The velocity ``v``, shape (bsz, ..., codebook_size).
+        """
+        numerator = self.c * self.alpha * (t ** (self.alpha - 1)) if t > 0 else 0
+        d_beta_t = numerator / (1 - t) ** (self.alpha + 1)
+        emb_x_1 = self.emb[self.categorical(logits.softmax(-1)) if x_1 is None else x_1]
+        dist_x_1_x = self.get_dist(emb_x_1, logits)  # (bsz, ..., codebook_size)
+        prob_x_1_x = self.get_prob_by_dist(dist_x_1_x, t)  # (bsz, ..., codebook_size)
+        dist_x_t_x_1 = self.get_dist(self.emb[x_t], emb_x_1)  # (bsz, ..., 1)
+        dist = torch.nn.functional.relu(dist_x_1_x.sub_(dist_x_t_x_1).neg_(), inplace=True)
+        return prob_x_1_x.mul_(d_beta_t).mul_(dist)  # (bsz, ..., codebook_size)
+
+
+class KineticOptimalScheduler(SchedulerMixin, ConfigMixin):
+    """Kinetic optimal scheduler with general discrete paths."""
+
+    @register_to_config
+    def __init__(self, alpha=None, c=None, shift=1.0, eps=1e-5, **kwargs):
+        self.alpha, self.c, self.shift, self.eps = alpha, c, shift, eps
+        self.init_args, self.path, self.codebook_size = kwargs or {}, None, 0
+        self.init_args.setdefault("shift", shift) if shift != 1 else None
+
+    def __repr__(self) -> str:
+        """Return the extra representation of this scheduler."""
+        s = f"{self.__class__.__name__}"
+        if self.alpha is None:  # Fallback to ``MixtureDiscreteProbPath``.
+            return s + "(shift={shift})".format(**self.__dict__)
+        return s + "(alpha={alpha}, c={c}, shift={shift})".format(**self.__dict__)
+
+    @classmethod
+    def from_pretrained(cls, pretrained_path, device=None, dtype=None, **kwargs) -> Self:
+        """Instantiate the scheduler from a pretrained model vocabulary."""
+        return KineticOptimalScheduler().load_pretrained(pretrained_path, device, dtype, **kwargs)
+
+    def load_pretrained(self, pretrained_path=None, device=None, dtype=None, **kwargs) -> Self:
+        """Load the scheduler from a pretrained model vocabulary."""
+        pretrained_path = self.init_args.get("pretrained_path", None) or pretrained_path
+        pretrained_args = super().from_pretrained(pretrained_path, **kwargs).__dict__
+        pretrained_args.update({"init_args": self.init_args, **self.init_args})
+        self.__dict__.update(pretrained_args)
+        model_file = os.path.join(pretrained_path, "scheduler_model.pth")
+        emb = torch.load(model_file, weights_only=False)["path.emb"]
+        emb = emb.to(device).to(dtype=dtype or torch.float16)
+        self.path = MetricDiscreteProbPath(emb=emb, alpha=self.alpha, c=self.c, eps=self.eps)
+        self.path = MixtureDiscreteProbPath(emb=emb) if self.alpha is None else self.path
+        self.codebook_size = self.path.emb.size(0)
+        return self
+
+    def to(self, device=None, dtype=None) -> Self:
+        """Convert to given device and dtype."""
+        for k, v in self.path.__dict__.items():
+            self.path.__dict__[k] = v.to(device, dtype) if isinstance(v, torch.Tensor) else v
+        return self
+
+    def sample_timesteps(self, size, device=None, generator=None) -> torch.Tensor:
+        """Sample a batch of timesteps for training.
+
+        Args:
+            size (Tuple[int])
+                The sample size of timesteps.
+            device (torch.device, optional)
+                The output device.
+            generator (torch.Generator, optional):
+                The random generator.
+        """
+        sigma = 1 - torch.rand(size, device=device, generator=generator).mul_(0.999)
+        return 1 - self.shift * sigma / (1 + (self.shift - 1) * sigma)
+
+    def set_timesteps(self, num_inference_steps, *args, **kwargs):
+        """Set the inference timesteps for sampling.
+
+        Args:
+            num_inference_steps (int)
+                The number of inference steps.
+        """
+        self.num_inference_steps = num_inference_steps
+        self.timesteps = torch.arange(num_inference_steps).tolist()
+
+    def add_noise(self, original_samples, timesteps, generator=None) -> torch.Tensor:
+        """Add forward noise to samples.
+
+        Args:
+            original_samples (torch.Tensor)
+                The sample token index, shape (bsz, ...).
+            t (float or torch.Tensor)
+                The timestep ``t``, shape (bsz,).
+            generator (torch.Generator, optional):
+                The random generator.
+
+        Returns:
+            torch.Tensor: The sample token index at time t, shape (bsz, ...).
+        """
+        self.path.generator = generator if generator else self.path.generator
+        return self.path.sample(original_samples, timesteps)
+
+    def timestep_to_t(self, timestep) -> float:
+        """Return the ``t`` for given timestep.
+
+        Args:
+            timestep (int)
+                The discrete timestep index.
+
+        Returns:
+            float: The continuous timestep in [0, 1).
+        """
+        sigma = 1 - self.timesteps[timestep] / self.num_inference_steps
+        return 1 - self.shift * sigma / (1 + (self.shift - 1) * sigma)
+
+    def step(
+        self,
+        model_output,
+        timestep,
+        sample,
+        generator=None,
+        return_dict=True,
+    ) -> KineticOptimalSchedulerOutput:
+        """Predict the sample from the previous timestep.
+
+        Args:
+            model_output (torch.Tensor)
+                The sample token logits at time t+1, shape (bsz, ..., codebook_size).
+            timestep (int)
+                The discrete timestep index.
+            sample (torch.Tensor)
+                The sample token index at time t, shape (bsz, ...).
+            generator (torch.Generator, optional):
+                The random generator.
+            return_dict (bool, optional)
+                Whether return the output in a dict.
+
+        Returns:
+            torch.Tensor: The sample token index at time t+1, shape (bsz, ...).
+        """
+        self.path.generator = generator if generator else self.path.generator
+        if timestep == self.num_inference_steps - 1:
+            prev_sample = self.path.categorical(model_output.softmax(-1))
+        else:
+            t = self.timestep_to_t(timestep)
+            dt = self.timestep_to_t(timestep + 1) - t
+            v = self.path.get_velocity(model_output, sample, t)
+            u_dist = torch.empty_like(sample, dtype=v.dtype).uniform_(generator=generator)
+            jump_thresh = 1 - v.scatter_(-1, sample[..., None], 0).sum(-1).mul_(-dt).exp_()
+            prev_sample, jump_index = sample.clone(), u_dist < jump_thresh
+            prev_sample[jump_index] = self.path.categorical(v[jump_index])
+        if not return_dict:
+            return (prev_sample,)
+        return KineticOptimalSchedulerOutput(prev_sample=prev_sample)
diff --git a/URSA/diffnext/utils/__init__.py b/URSA/diffnext/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6464320eaa9e83e6b71182dcdc99477a9f1bbb45
--- /dev/null
+++ b/URSA/diffnext/utils/__init__.py
@@ -0,0 +1,19 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, esither express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Utilities."""
+
+from diffnext.utils.export_utils import export_to_image
+from diffnext.utils.export_utils import export_to_video
diff --git a/URSA/diffnext/utils/accelerate_utils.py b/URSA/diffnext/utils/accelerate_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..ee154d16a1ded91cf53a06e0f7a46e53ae07e4f1
--- /dev/null
+++ b/URSA/diffnext/utils/accelerate_utils.py
@@ -0,0 +1,105 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, esither express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Accelerate utilities."""
+
+import atexit
+import functools
+import logging
+import os
+import sys
+import time
+
+import accelerate
+import torch
+import wandb
+
+from diffnext.utils.omegaconf_utils import flatten_omega_conf
+
+from accelerate import Accelerator
+from accelerate.utils import DistributedDataParallelKwargs
+
+
+def build_accelerator(config, **kwargs) -> accelerate.Accelerator:
+    """Build accelerator."""
+    
+    kwargs_handlers = []
+
+    # 对普通 DDP 开启 unused param 检测
+    ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)
+    kwargs_handlers.append(ddp_kwargs)
+    
+    accelerator = accelerate.Accelerator(
+        log_with=kwargs.get("log_with", None),
+        mixed_precision=config.training.mixed_precision,
+        gradient_accumulation_steps=config.training.gradient_accumulation_steps,
+        kwargs_handlers=kwargs_handlers,
+    )
+    if hasattr(accelerator.state.deepspeed_plugin, "deepspeed_config"):
+        import deepspeed
+
+        deepspeed.logger.setLevel(kwargs.get("deepspeed_log_lvl", "WARNING"))
+        # Dummy size to avoid the raised errors.
+        accelerator.state.deepspeed_plugin.deepspeed_config["train_micro_batch_size_per_gpu"] = 1
+    return accelerator
+
+
+def build_wandb(config, accelerator):
+    """Build wandb for accelerator."""
+    if "wandb" not in config or not accelerator.is_main_process:
+        return
+    config.wandb = config.wandb or type(config)({})
+    old_run_id = config.wandb.get("run_id", None)
+    config.wandb.run_id = run_id = old_run_id or wandb.util.generate_id()
+    init_kwargs = dict(id=run_id, name=config.experiment.name, resume=old_run_id is not None)
+    init_kwargs["config"] = {k: v for k, v in flatten_omega_conf(config, True)}
+    accelerator.init_trackers(config.experiment.project, init_kwargs={"wandb": init_kwargs})
+
+
+def get_ddp_shards(accelerator) -> dict:
+    """Return the shard arguments for simple DDP."""
+    return {"shard_id": accelerator.process_index, "num_shards": accelerator.num_processes}
+
+
+def precision_to_dtype(precision="bf16") -> torch.dtype:
+    """Convert precision string to torch dtype."""
+    str_dict = {"fp16": "float16", "bf16": "bfloat16", "fp32": "float32"}
+    return getattr(torch, str_dict.get(precision.lower(), "float32"))
+
+
+@functools.lru_cache()
+def set_logger(output_dir=None, name="diffnext", level="INFO", accelerator=None):
+    """Set logger."""
+
+    @functools.lru_cache(maxsize=None)
+    def cached_log_stream(filename):
+        """Register a cached filename."""
+        f = open(filename, "a")
+        atexit.register(f.close)
+        return f
+
+    logger = logging.getLogger(name)
+    logger.propagate, _ = False, logger.setLevel(level)
+    fmt = "%(asctime)s %(levelname)s %(filename)s:%(lineno)d] %(message)s"
+    formatter = logging.Formatter(fmt, datefmt="%m/%d %H:%M:%S")
+    ch = logging.StreamHandler(sys.stdout)
+    ch.setLevel(level), ch.setFormatter(formatter), logger.addHandler(ch)
+    output_dir = "" if (accelerator and not accelerator.is_main_process) else output_dir
+    if output_dir:
+        os.makedirs(os.path.join(output_dir, "logs"), exist_ok=True)
+        log_file = time.strftime("%Y%m%d_%H%M%S", time.localtime(time.time())) + ".log"
+        fh = logging.StreamHandler(cached_log_stream(os.path.join(output_dir, "logs", log_file)))
+        fh.setLevel(level), fh.setFormatter(formatter), logger.addHandler(fh)
+    return accelerate.logging.MultiProcessAdapter(logger, {}) if accelerator else logger
diff --git a/URSA/diffnext/utils/export_utils.py b/URSA/diffnext/utils/export_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..72dd8d023617fa52e3797b475e35766cabe3639a
--- /dev/null
+++ b/URSA/diffnext/utils/export_utils.py
@@ -0,0 +1,47 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, esither express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Export utilities."""
+
+import tempfile
+
+try:
+    import imageio
+except ImportError:
+    imageio = None
+import PIL.Image
+
+
+def export_to_image(image, output_image_path=None, suffix=".webp", quality=100):
+    """Export to image."""
+    if output_image_path is None:
+        output_image_path = tempfile.NamedTemporaryFile(suffix=suffix).name
+    if isinstance(image, PIL.Image.Image):
+        image.save(output_image_path, quality=quality)
+    else:
+        PIL.Image.fromarray(image).save(output_image_path, quality=quality)
+    return output_image_path
+
+
+def export_to_video(video_frames, output_video_path=None, fps=12):
+    """Export to video."""
+    if output_video_path is None:
+        output_video_path = tempfile.NamedTemporaryFile(suffix=".mp4").name
+    if imageio is None:
+        raise ImportError("Failed to import <imageio> library.")
+    with imageio.get_writer(output_video_path, fps=fps) as writer:
+        for frame in video_frames:
+            writer.append_data(frame)
+    return output_video_path
diff --git a/URSA/diffnext/utils/omegaconf_utils.py b/URSA/diffnext/utils/omegaconf_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..cb6c0792596961763c4e7e386cf63262ba9dca79
--- /dev/null
+++ b/URSA/diffnext/utils/omegaconf_utils.py
@@ -0,0 +1,102 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, esither express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Omegaconf utilities."""
+
+import importlib
+import json
+from typing import List
+
+import omegaconf
+
+
+class OmegaConfEncoder(json.JSONEncoder):
+    """Custom JSON encoder for omegaconf objects."""
+
+    def default(self, obj):
+        if isinstance(obj, (omegaconf.ListConfig, omegaconf.DictConfig)):
+            return omegaconf.OmegaConf.to_container(obj, resolve=True)
+        return super().default(obj)
+
+
+def get_config() -> omegaconf.DictConfig:
+    """Return omega configurations from CLI."""
+    cli_conf = omegaconf.OmegaConf.from_cli()
+    omegaconf.OmegaConf.register_new_resolver("eval", eval)  # Register ``eval`` func.
+    return omegaconf.OmegaConf.merge(omegaconf.OmegaConf.load(cli_conf.config), cli_conf)
+
+
+def save_config(config: omegaconf.DictConfig, f):
+    """Save config to YAML format string."""
+    omegaconf.OmegaConf.save(config, f)
+
+
+def config_to_yaml(config: omegaconf.DictConfig) -> str:
+    """Dump config to YAML format string."""
+    return omegaconf.OmegaConf.to_yaml(config)
+
+
+def config_to_class(config: omegaconf.DictConfig) -> object:
+    """Return the class object from config."""
+
+    def get_obj_from_str(string, reload=False):
+        module, cls = string.rsplit(".", 1)
+        if reload:
+            module_imp = importlib.import_module(module)
+            importlib.reload(module_imp)
+        return getattr(importlib.import_module(module, package=None), cls)
+
+    if not config:
+        return None
+    if "target" not in config:
+        raise KeyError("Expected key `target` to instantiate.")
+    return get_obj_from_str(config["target"])
+
+
+def config_to_object(config: omegaconf.DictConfig, **kwargs) -> object:
+    """Instantiate an object from config."""
+    if not config:
+        return None
+    kwargs.update(config.get("params", dict()))
+    return config_to_class(config)(**kwargs)
+
+
+def flatten_omega_conf(cfg, resolve=True) -> List:
+    """Flatten omega configurations."""
+    ret = []
+
+    def handle_dict(key, value, resolve):
+        return [(f"{key}.{k1}", v1) for k1, v1 in flatten_omega_conf(value, resolve=resolve)]
+
+    def handle_list(key, value, resolve):
+        return [(f"{key}.{idx}", v1) for idx, v1 in flatten_omega_conf(value, resolve=resolve)]
+
+    if isinstance(cfg, omegaconf.DictConfig):
+        for k, v in cfg.items_ex(resolve=resolve):
+            if isinstance(v, omegaconf.DictConfig):
+                ret.extend(handle_dict(k, v, resolve=resolve))
+            elif isinstance(v, omegaconf.ListConfig):
+                ret.extend(handle_list(k, v, resolve=resolve))
+            else:
+                ret.append((str(k), v))
+    elif isinstance(cfg, omegaconf.ListConfig):
+        for idx, v in enumerate(cfg._iter_ex(resolve=resolve)):
+            if isinstance(v, omegaconf.DictConfig):
+                ret.extend(handle_dict(idx, v, resolve=resolve))
+            elif isinstance(v, omegaconf.ListConfig):
+                ret.extend(handle_list(idx, v, resolve=resolve))
+            else:
+                ret.append((str(idx), v))
+    return ret
diff --git a/URSA/diffnext/utils/profiler.py b/URSA/diffnext/utils/profiler.py
new file mode 100644
index 0000000000000000000000000000000000000000..343e4904632dee3ed47383adea9c152fa0e14170
--- /dev/null
+++ b/URSA/diffnext/utils/profiler.py
@@ -0,0 +1,90 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, esither express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Profiler utilities."""
+
+import collections
+import contextlib
+import datetime
+import time
+import numpy as np
+
+
+class SmoothedValue(object):
+    """Track values and provide smoothed report."""
+
+    def __init__(self, window_size=None, fmt=None):
+        self.fmt = fmt or "{median:.4f} ({mean:.4f})"
+        self.deque = collections.deque(maxlen=window_size)
+
+    def __str__(self):
+        return self.fmt.format(value=self.value, mean=self.mean, median=self.median)
+
+    @property
+    def value(self):
+        return self.deque[-1]
+
+    @property
+    def mean(self):
+        return np.mean(self.deque)
+
+    @property
+    def median(self):
+        return np.median(self.deque)
+
+    def update(self, value):
+        self.deque.append(value)
+
+
+class Timer(object):
+    """Simple timer."""
+
+    def __init__(self):
+        self.total_time = 0.0
+        self.calls = 0
+        self.start_time = 0.0
+        self.diff = 0.0
+        self.average_time = 0.0
+
+    def add_diff(self, diff, n=1, average=False):
+        self.total_time += diff
+        self.calls += n
+        self.average_time = self.total_time / self.calls
+        return self.average_time if average else self.diff
+
+    @contextlib.contextmanager
+    def tic_and_toc(self, n=1):
+        try:
+            yield self.tic()
+        finally:
+            self.toc(n)
+
+    def tic(self):
+        self.start_time = time.time()
+        return self
+
+    def toc(self, n=1, average=False):
+        self.diff = time.time() - self.start_time
+        return self.add_diff(self.diff, n, average)
+
+
+def get_progress(timer, step, max_steps):
+    """Return the progress information."""
+    eta_seconds = timer.average_time * (max_steps - step)
+    eta = str(datetime.timedelta(seconds=int(eta_seconds)))
+    progress = (step + 1.0) / max_steps
+    return "< PROGRESS: {:.2%} | SPEED: {:.3f}s / step | ETA: {} >".format(
+        progress, timer.average_time, eta
+    )
diff --git a/URSA/diffnext/utils/registry.py b/URSA/diffnext/utils/registry.py
new file mode 100644
index 0000000000000000000000000000000000000000..b32ccfe31b835d1c306cd153a55c668b527f881b
--- /dev/null
+++ b/URSA/diffnext/utils/registry.py
@@ -0,0 +1,54 @@
+# ------------------------------------------------------------------------
+# Copyright (c) 2024-present, BAAI. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ------------------------------------------------------------------------
+"""Registry utilities."""
+
+import collections
+import functools
+
+
+class Registry(object):
+    """Registry class."""
+
+    def __init__(self, name):
+        self.name = name
+        self.registry = collections.OrderedDict()
+
+    def has(self, key) -> bool:
+        return key in self.registry
+
+    def register(self, name, func=None, **kwargs):
+        def decorated(inner_function):
+            for key in name if isinstance(name, (tuple, list)) else [name]:
+                self.registry[key] = functools.partial(inner_function, **kwargs)
+            return inner_function
+
+        if func is not None:
+            return decorated(func)
+        return decorated
+
+    def get(self, name, default=None):
+        if name is None:
+            return None
+        if not self.has(name):
+            if default is not None:
+                return default
+            raise KeyError("`%s` is not registered in <%s>." % (name, self.name))
+        return self.registry[name]
+
+    def try_get(self, name):
+        if self.has(name):
+            return self.get(name)
+        return None
diff --git a/URSA/experiments/distill_dimo/config.yaml b/URSA/experiments/distill_dimo/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2314e3231cb00752a9342e407e25c8e89f2be316
--- /dev/null
+++ b/URSA/experiments/distill_dimo/config.yaml
@@ -0,0 +1,69 @@
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
diff --git a/URSA/experiments/distill_dimo/logs/20260318_161904.log b/URSA/experiments/distill_dimo/logs/20260318_161904.log
new file mode 100644
index 0000000000000000000000000000000000000000..94bed27d4f0146e82ebaf496e100be0ba659ce36
--- /dev/null
+++ b/URSA/experiments/distill_dimo/logs/20260318_161904.log
@@ -0,0 +1,73 @@
+03/18 16:19:04 INFO train_distill_dimo.py:834] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 16:19:04 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 16:20:25 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
diff --git a/URSA/experiments/distill_dimo/logs/20260318_184748.log b/URSA/experiments/distill_dimo/logs/20260318_184748.log
new file mode 100644
index 0000000000000000000000000000000000000000..76f40df080379003b414c2408666cc5c13e874e0
--- /dev/null
+++ b/URSA/experiments/distill_dimo/logs/20260318_184748.log
@@ -0,0 +1,76 @@
+03/18 18:47:48 INFO train_distill_dimo.py:871] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 18:47:48 INFO train_distill_dimo.py:153] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 18:49:13 INFO train_distill_dimo.py:176] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 18:49:27 INFO train_distill_dimo.py:279] [init] student params: 1982.17M
+03/18 18:49:27 INFO train_distill_dimo.py:282] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/18 18:49:27 INFO train_distill_dimo.py:653] [train] Starting from step 0 / 10000
diff --git a/URSA/experiments/distill_dimo/logs/20260318_190248.log b/URSA/experiments/distill_dimo/logs/20260318_190248.log
new file mode 100644
index 0000000000000000000000000000000000000000..c360f0bbcc23aade502cd73a3d9bd2460ba2cae0
--- /dev/null
+++ b/URSA/experiments/distill_dimo/logs/20260318_190248.log
@@ -0,0 +1,76 @@
+03/18 19:02:48 INFO train_distill_dimo.py:871] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 19:02:48 INFO train_distill_dimo.py:153] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 19:04:13 INFO train_distill_dimo.py:176] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 19:04:24 INFO train_distill_dimo.py:279] [init] student params: 1982.17M
+03/18 19:04:24 INFO train_distill_dimo.py:282] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/18 19:04:24 INFO train_distill_dimo.py:653] [train] Starting from step 0 / 10000
diff --git a/URSA/experiments/distill_dimo/logs/20260318_191449.log b/URSA/experiments/distill_dimo/logs/20260318_191449.log
new file mode 100644
index 0000000000000000000000000000000000000000..7f0dbc9dc2e3d30d49bda2ab2936d6ab9cc6ad15
--- /dev/null
+++ b/URSA/experiments/distill_dimo/logs/20260318_191449.log
@@ -0,0 +1,76 @@
+03/18 19:14:49 INFO train_distill_dimo.py:871] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 19:14:49 INFO train_distill_dimo.py:153] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 19:16:15 INFO train_distill_dimo.py:176] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 19:16:23 INFO train_distill_dimo.py:279] [init] student params: 1982.17M
+03/18 19:16:23 INFO train_distill_dimo.py:282] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/18 19:16:23 INFO train_distill_dimo.py:653] [train] Starting from step 0 / 10000
diff --git a/URSA/experiments/distill_dimo/logs/20260318_200752.log b/URSA/experiments/distill_dimo/logs/20260318_200752.log
new file mode 100644
index 0000000000000000000000000000000000000000..4e5f6e2fbf50e148b58d164a9fac708211db7f71
--- /dev/null
+++ b/URSA/experiments/distill_dimo/logs/20260318_200752.log
@@ -0,0 +1,72 @@
+03/18 20:07:52 INFO train_distill_dimo.py:871] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 20:07:52 INFO train_distill_dimo.py:153] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
diff --git a/URSA/experiments/distill_dimo/logs/20260318_203514.log b/URSA/experiments/distill_dimo/logs/20260318_203514.log
new file mode 100644
index 0000000000000000000000000000000000000000..fd0884aa135c2dd8eb052346699ce0f3502990e9
--- /dev/null
+++ b/URSA/experiments/distill_dimo/logs/20260318_203514.log
@@ -0,0 +1,1908 @@
+03/18 20:35:14 INFO train_distill_dimo.py:871] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 20:35:14 INFO train_distill_dimo.py:153] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 20:36:39 INFO train_distill_dimo.py:176] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 20:36:50 INFO train_distill_dimo.py:279] [init] student params: 1982.17M
+03/18 20:36:50 INFO train_distill_dimo.py:282] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/18 20:36:50 INFO train_distill_dimo.py:653] [train] Starting from step 0 / 10000
+03/18 20:39:48 INFO train_distill_dimo.py:734] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=2.72s
+03/18 20:39:48 INFO train_distill_dimo.py:745]     Train H_mean: 3.3750 (3.4241)
+03/18 20:39:48 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.0017 (-0.0017)
+03/18 20:39:48 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.0112 (0.0241)
+03/18 20:39:48 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.0065 (0.0154)
+03/18 20:39:48 INFO train_distill_dimo.py:745]     Train loss_pg: -0.0156 (-0.0264)
+03/18 20:39:48 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -3.3750 (-3.4227)
+03/18 20:39:48 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4524 (8.4514)
+03/18 20:39:48 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.0200)
+03/18 20:42:11 INFO train_distill_dimo.py:734] Iteration 100, lr_s=2.01e-06 lr_a=2.01e-06, time=2.69s
+03/18 20:42:11 INFO train_distill_dimo.py:745]     Train H_mean: 4.7500 (4.8144)
+03/18 20:42:11 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.0122 (-0.0114)
+03/18 20:42:11 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.0146 (0.0229)
+03/18 20:42:11 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.0198 (0.0438)
+03/18 20:42:11 INFO train_distill_dimo.py:745]     Train loss_pg: -0.0434 (-0.1456)
+03/18 20:42:11 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -4.6094 (-4.7594)
+03/18 20:42:11 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4339 (8.4330)
+03/18 20:42:11 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.0200)
+03/18 20:44:34 INFO train_distill_dimo.py:734] Iteration 150, lr_s=3.01e-06 lr_a=3.01e-06, time=4.06s
+03/18 20:44:34 INFO train_distill_dimo.py:745]     Train H_mean: 7.8125 (7.3128)
+03/18 20:44:34 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.0352 (-0.0342)
+03/18 20:44:34 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.0203 (0.0475)
+03/18 20:44:34 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.0327 (0.0836)
+03/18 20:44:34 INFO train_distill_dimo.py:745]     Train loss_pg: 0.0134 (-0.3499)
+03/18 20:44:34 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -7.6562 (-7.2647)
+03/18 20:44:34 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4062 (8.3771)
+03/18 20:44:34 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.0400)
+03/18 20:46:56 INFO train_distill_dimo.py:734] Iteration 200, lr_s=4.01e-06 lr_a=4.01e-06, time=2.79s
+03/18 20:46:56 INFO train_distill_dimo.py:745]     Train H_mean: 4.4688 (4.3986)
+03/18 20:46:56 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1438 (-0.1111)
+03/18 20:46:56 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.0688 (0.1428)
+03/18 20:46:56 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1639 (0.3457)
+03/18 20:46:56 INFO train_distill_dimo.py:745]     Train loss_pg: -0.2761 (-0.8727)
+03/18 20:46:56 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -5.0625 (-5.0916)
+03/18 20:46:56 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4339 (8.3783)
+03/18 20:46:56 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.1200)
+03/18 20:49:18 INFO train_distill_dimo.py:734] Iteration 250, lr_s=5.00e-06 lr_a=5.00e-06, time=2.68s
+03/18 20:49:18 INFO train_distill_dimo.py:745]     Train H_mean: 4.0312 (5.0499)
+03/18 20:49:18 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1556 (-0.1530)
+03/18 20:49:18 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.0742 (0.1269)
+03/18 20:49:18 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1139 (0.1891)
+03/18 20:49:18 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2116 (0.1463)
+03/18 20:49:18 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -4.7344 (-5.3605)
+03/18 20:49:18 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4540 (8.4527)
+03/18 20:49:18 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.1400)
+03/18 20:51:41 INFO train_distill_dimo.py:734] Iteration 300, lr_s=6.00e-06 lr_a=6.00e-06, time=2.71s
+03/18 20:51:41 INFO train_distill_dimo.py:745]     Train H_mean: 4.0781 (4.2345)
+03/18 20:51:41 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2081 (-0.1873)
+03/18 20:51:41 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1053 (0.1667)
+03/18 20:51:41 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1643 (0.3167)
+03/18 20:51:41 INFO train_distill_dimo.py:745]     Train loss_pg: 0.1967 (-0.4150)
+03/18 20:51:41 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -4.3906 (-4.5009)
+03/18 20:51:41 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4560 (8.4545)
+03/18 20:51:41 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.1400)
+03/18 20:54:04 INFO train_distill_dimo.py:734] Iteration 350, lr_s=7.00e-06 lr_a=7.00e-06, time=2.73s
+03/18 20:54:04 INFO train_distill_dimo.py:745]     Train H_mean: 3.5859 (3.9712)
+03/18 20:54:04 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2124 (-0.2119)
+03/18 20:54:04 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1254 (0.2710)
+03/18 20:54:04 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1539 (0.2513)
+03/18 20:54:04 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3013 (0.0071)
+03/18 20:54:04 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -3.7188 (-4.0655)
+03/18 20:54:04 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4538 (8.4539)
+03/18 20:54:04 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.1200)
+03/18 20:56:26 INFO train_distill_dimo.py:734] Iteration 400, lr_s=8.00e-06 lr_a=8.00e-06, time=2.79s
+03/18 20:56:26 INFO train_distill_dimo.py:745]     Train H_mean: 4.8438 (4.7670)
+03/18 20:56:26 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1996 (-0.1995)
+03/18 20:56:26 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1688 (0.2627)
+03/18 20:56:26 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1241 (0.1890)
+03/18 20:56:26 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4584 (0.2404)
+03/18 20:56:26 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -4.7812 (-4.6900)
+03/18 20:56:26 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4570 (8.4554)
+03/18 20:56:26 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.2400)
+03/18 20:58:47 INFO train_distill_dimo.py:734] Iteration 450, lr_s=9.00e-06 lr_a=9.00e-06, time=2.70s
+03/18 20:58:47 INFO train_distill_dimo.py:745]     Train H_mean: 7.1875 (6.7705)
+03/18 20:58:47 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2130 (-0.2117)
+03/18 20:58:47 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2105 (0.3956)
+03/18 20:58:47 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1452 (0.3295)
+03/18 20:58:47 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5317 (-0.4927)
+03/18 20:58:47 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -7.2031 (-6.8055)
+03/18 20:58:47 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4584 (8.4592)
+03/18 20:58:47 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.2400)
+03/18 21:01:11 INFO train_distill_dimo.py:734] Iteration 500, lr_s=1.00e-05 lr_a=1.00e-05, time=2.71s
+03/18 21:01:11 INFO train_distill_dimo.py:745]     Train H_mean: 7.4219 (6.1672)
+03/18 21:01:11 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1966 (-0.1956)
+03/18 21:01:11 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2617 (0.3154)
+03/18 21:01:11 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1343 (0.2039)
+03/18 21:01:11 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3728 (0.2392)
+03/18 21:01:11 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -7.8906 (-6.5155)
+03/18 21:01:11 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4573 (8.4361)
+03/18 21:01:11 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.2600)
+03/18 21:01:11 INFO train_distill_dimo.py:667] < PROGRESS: 5.01% | SPEED: 2.922s / step | ETA: 7:42:38 >
+03/18 21:03:36 INFO train_distill_dimo.py:734] Iteration 550, lr_s=1.00e-05 lr_a=1.00e-05, time=2.71s
+03/18 21:03:36 INFO train_distill_dimo.py:745]     Train H_mean: 3.7344 (4.4553)
+03/18 21:03:36 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2029 (-0.2017)
+03/18 21:03:36 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3211 (0.5040)
+03/18 21:03:36 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2175 (0.3580)
+03/18 21:03:36 INFO train_distill_dimo.py:745]     Train loss_pg: 0.0058 (-0.2480)
+03/18 21:03:36 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -3.8438 (-4.6481)
+03/18 21:03:36 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4575 (8.4572)
+03/18 21:03:36 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.1600)
+03/18 21:05:59 INFO train_distill_dimo.py:734] Iteration 600, lr_s=1.00e-05 lr_a=1.00e-05, time=2.74s
+03/18 21:05:59 INFO train_distill_dimo.py:745]     Train H_mean: 8.0625 (7.4375)
+03/18 21:05:59 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1982 (-0.1982)
+03/18 21:05:59 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2054 (0.3253)
+03/18 21:05:59 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1357 (0.1726)
+03/18 21:05:59 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5894 (0.4006)
+03/18 21:05:59 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.2812 (-7.6141)
+03/18 21:05:59 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4608 (8.4614)
+03/18 21:05:59 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.2400)
+03/18 21:08:23 INFO train_distill_dimo.py:734] Iteration 650, lr_s=9.99e-06 lr_a=9.99e-06, time=2.71s
+03/18 21:08:23 INFO train_distill_dimo.py:745]     Train H_mean: 7.5000 (6.5567)
+03/18 21:08:23 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1745 (-0.1741)
+03/18 21:08:23 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2443 (0.3354)
+03/18 21:08:23 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1819 (0.2115)
+03/18 21:08:23 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4403 (0.2624)
+03/18 21:08:23 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -7.7656 (-6.9380)
+03/18 21:08:23 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4602 (8.4598)
+03/18 21:08:23 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.3600)
+03/18 21:10:47 INFO train_distill_dimo.py:734] Iteration 700, lr_s=9.99e-06 lr_a=9.99e-06, time=3.94s
+03/18 21:10:47 INFO train_distill_dimo.py:745]     Train H_mean: 7.0156 (6.5630)
+03/18 21:10:47 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1671 (-0.1675)
+03/18 21:10:47 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1978 (0.3069)
+03/18 21:10:47 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1477 (0.2061)
+03/18 21:10:47 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2668 (0.0306)
+03/18 21:10:47 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -7.2500 (-6.6694)
+03/18 21:10:47 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4671 (8.4668)
+03/18 21:10:47 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.3200)
+03/18 21:13:11 INFO train_distill_dimo.py:734] Iteration 750, lr_s=9.98e-06 lr_a=9.98e-06, time=2.73s
+03/18 21:13:11 INFO train_distill_dimo.py:745]     Train H_mean: 6.3750 (6.4122)
+03/18 21:13:11 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1655 (-0.1648)
+03/18 21:13:11 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2427 (0.3383)
+03/18 21:13:11 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1476 (0.2286)
+03/18 21:13:11 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2150 (-0.0159)
+03/18 21:13:11 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -6.4844 (-6.4337)
+03/18 21:13:11 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4674 (8.4667)
+03/18 21:13:11 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.4600)
+03/18 21:15:35 INFO train_distill_dimo.py:734] Iteration 800, lr_s=9.98e-06 lr_a=9.98e-06, time=2.73s
+03/18 21:15:35 INFO train_distill_dimo.py:745]     Train H_mean: 5.0781 (5.7780)
+03/18 21:15:35 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1661 (-0.1661)
+03/18 21:15:35 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3242 (0.3584)
+03/18 21:15:35 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1796 (0.2163)
+03/18 21:15:35 INFO train_distill_dimo.py:745]     Train loss_pg: 0.1942 (0.1126)
+03/18 21:15:35 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -5.1406 (-5.6778)
+03/18 21:15:35 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4635 (8.4632)
+03/18 21:15:35 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.4400)
+03/18 21:18:00 INFO train_distill_dimo.py:734] Iteration 850, lr_s=9.97e-06 lr_a=9.97e-06, time=2.73s
+03/18 21:18:00 INFO train_distill_dimo.py:745]     Train H_mean: 6.0000 (6.1567)
+03/18 21:18:00 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1668 (-0.1674)
+03/18 21:18:00 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1763 (0.2813)
+03/18 21:18:00 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1380 (0.2411)
+03/18 21:18:00 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2565 (-0.1403)
+03/18 21:18:00 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -6.4844 (-6.5184)
+03/18 21:18:00 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4587 (8.4554)
+03/18 21:18:00 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.2800)
+03/18 21:20:24 INFO train_distill_dimo.py:734] Iteration 900, lr_s=9.96e-06 lr_a=9.96e-06, time=2.77s
+03/18 21:20:24 INFO train_distill_dimo.py:745]     Train H_mean: 4.7031 (6.1169)
+03/18 21:20:24 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1572 (-0.1572)
+03/18 21:20:24 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1993 (0.3248)
+03/18 21:20:24 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1330 (0.1849)
+03/18 21:20:24 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2133 (0.1925)
+03/18 21:20:24 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -5.3438 (-6.4197)
+03/18 21:20:24 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4663 (8.4658)
+03/18 21:20:24 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.4800)
+03/18 21:22:48 INFO train_distill_dimo.py:734] Iteration 950, lr_s=9.95e-06 lr_a=9.95e-06, time=3.14s
+03/18 21:22:48 INFO train_distill_dimo.py:745]     Train H_mean: 5.5938 (6.2375)
+03/18 21:22:48 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1500 (-0.1500)
+03/18 21:22:48 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2023 (0.2959)
+03/18 21:22:48 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1430 (0.1871)
+03/18 21:22:48 INFO train_distill_dimo.py:745]     Train loss_pg: 0.1800 (0.0750)
+03/18 21:22:48 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -6.2031 (-6.3784)
+03/18 21:22:48 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4699 (8.4691)
+03/18 21:22:48 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.3200)
+03/18 21:25:11 INFO train_distill_dimo.py:734] Iteration 1000, lr_s=9.94e-06 lr_a=9.94e-06, time=3.13s
+03/18 21:25:11 INFO train_distill_dimo.py:745]     Train H_mean: 8.1250 (7.2098)
+03/18 21:25:11 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1484 (-0.1479)
+03/18 21:25:11 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2383 (0.3443)
+03/18 21:25:11 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1671 (0.2018)
+03/18 21:25:11 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2909 (0.1403)
+03/18 21:25:11 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.0156 (-7.2950)
+03/18 21:25:11 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4682 (8.4690)
+03/18 21:25:11 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.4400)
+03/18 21:25:11 INFO train_distill_dimo.py:667] < PROGRESS: 10.01% | SPEED: 2.901s / step | ETA: 7:15:06 >
+03/18 21:25:30 INFO train_distill_dimo.py:720] [save] step=1000 → ./experiments/distill_dimo/checkpoints/checkpoint-1000
+03/18 21:27:55 INFO train_distill_dimo.py:734] Iteration 1050, lr_s=9.93e-06 lr_a=9.93e-06, time=2.71s
+03/18 21:27:55 INFO train_distill_dimo.py:745]     Train H_mean: 8.7500 (8.3763)
+03/18 21:27:55 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1422 (-0.1418)
+03/18 21:27:55 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2347 (0.3808)
+03/18 21:27:55 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1341 (0.2224)
+03/18 21:27:55 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3939 (0.0356)
+03/18 21:27:55 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.0000 (-8.4777)
+03/18 21:27:55 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4685 (8.4706)
+03/18 21:27:55 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.4800)
+03/18 21:30:21 INFO train_distill_dimo.py:734] Iteration 1100, lr_s=9.91e-06 lr_a=9.91e-06, time=2.70s
+03/18 21:30:21 INFO train_distill_dimo.py:745]     Train H_mean: 8.6875 (7.9212)
+03/18 21:30:21 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1382 (-0.1401)
+03/18 21:30:21 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2771 (0.3468)
+03/18 21:30:21 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1894 (0.2346)
+03/18 21:30:21 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2006 (-0.4961)
+03/18 21:30:21 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.9375 (-8.3644)
+03/18 21:30:21 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4698 (8.4708)
+03/18 21:30:21 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.5600)
+03/18 21:32:47 INFO train_distill_dimo.py:734] Iteration 1150, lr_s=9.90e-06 lr_a=9.90e-06, time=2.70s
+03/18 21:32:47 INFO train_distill_dimo.py:745]     Train H_mean: 6.3594 (6.6694)
+03/18 21:32:47 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1660 (-0.1655)
+03/18 21:32:47 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2696 (0.4590)
+03/18 21:32:47 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1879 (0.2704)
+03/18 21:32:47 INFO train_distill_dimo.py:745]     Train loss_pg: 0.1804 (-0.2790)
+03/18 21:32:47 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -6.4375 (-6.8006)
+03/18 21:32:47 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4665 (8.4670)
+03/18 21:32:47 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.4600)
+03/18 21:35:13 INFO train_distill_dimo.py:734] Iteration 1200, lr_s=9.88e-06 lr_a=9.88e-06, time=2.75s
+03/18 21:35:13 INFO train_distill_dimo.py:745]     Train H_mean: 10.1250 (9.5994)
+03/18 21:35:13 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1760 (-0.1821)
+03/18 21:35:13 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3055 (0.4614)
+03/18 21:35:13 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1896 (0.3128)
+03/18 21:35:13 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3458 (-0.4040)
+03/18 21:35:13 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0938 (-9.6538)
+03/18 21:35:13 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4749 (8.4737)
+03/18 21:35:13 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.6200)
+03/18 21:37:39 INFO train_distill_dimo.py:734] Iteration 1250, lr_s=9.86e-06 lr_a=9.86e-06, time=4.16s
+03/18 21:37:39 INFO train_distill_dimo.py:745]     Train H_mean: 5.2344 (5.5777)
+03/18 21:37:39 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2058 (-0.2063)
+03/18 21:37:39 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2987 (0.4499)
+03/18 21:37:39 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1870 (0.3108)
+03/18 21:37:39 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3408 (-0.1744)
+03/18 21:37:39 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -5.0625 (-5.5755)
+03/18 21:37:39 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4702 (8.4694)
+03/18 21:37:39 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.6800)
+03/18 21:40:03 INFO train_distill_dimo.py:734] Iteration 1300, lr_s=9.84e-06 lr_a=9.84e-06, time=2.71s
+03/18 21:40:03 INFO train_distill_dimo.py:745]     Train H_mean: 8.8125 (8.0388)
+03/18 21:40:03 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1876 (-0.1868)
+03/18 21:40:03 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3067 (0.3852)
+03/18 21:40:03 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2152 (0.2410)
+03/18 21:40:03 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4498 (0.2739)
+03/18 21:40:03 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.7500 (-8.0547)
+03/18 21:40:03 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4714 (8.4721)
+03/18 21:40:03 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.6800)
+03/18 21:42:28 INFO train_distill_dimo.py:734] Iteration 1350, lr_s=9.82e-06 lr_a=9.82e-06, time=2.72s
+03/18 21:42:28 INFO train_distill_dimo.py:745]     Train H_mean: 8.2188 (7.5062)
+03/18 21:42:28 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2775 (-0.2598)
+03/18 21:42:28 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2873 (0.3980)
+03/18 21:42:28 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1817 (0.5156)
+03/18 21:42:28 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5936 (-1.3827)
+03/18 21:42:28 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.2812 (-7.4894)
+03/18 21:42:28 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4717 (8.4729)
+03/18 21:42:28 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.6400)
+03/18 21:44:53 INFO train_distill_dimo.py:734] Iteration 1400, lr_s=9.80e-06 lr_a=9.80e-06, time=2.70s
+03/18 21:44:53 INFO train_distill_dimo.py:745]     Train H_mean: 6.2031 (6.5934)
+03/18 21:44:53 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2651 (-0.2643)
+03/18 21:44:53 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2975 (0.3951)
+03/18 21:44:53 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2031 (0.3117)
+03/18 21:44:53 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6243 (0.0404)
+03/18 21:44:53 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -6.2500 (-6.7378)
+03/18 21:44:53 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4692 (8.3994)
+03/18 21:44:53 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.8200)
+03/18 21:47:18 INFO train_distill_dimo.py:734] Iteration 1450, lr_s=9.78e-06 lr_a=9.78e-06, time=2.81s
+03/18 21:47:18 INFO train_distill_dimo.py:745]     Train H_mean: 4.2812 (4.9034)
+03/18 21:47:18 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2570 (-0.2590)
+03/18 21:47:18 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3591 (0.6105)
+03/18 21:47:18 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2177 (0.3037)
+03/18 21:47:18 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4159 (0.1873)
+03/18 21:47:18 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -4.3906 (-5.1437)
+03/18 21:47:18 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4093 (8.1855)
+03/18 21:47:18 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.8000)
+03/18 21:49:44 INFO train_distill_dimo.py:734] Iteration 1500, lr_s=9.76e-06 lr_a=9.76e-06, time=3.65s
+03/18 21:49:44 INFO train_distill_dimo.py:745]     Train H_mean: 10.6250 (9.5222)
+03/18 21:49:44 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2429 (-0.2737)
+03/18 21:49:44 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3148 (0.3930)
+03/18 21:49:44 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2158 (0.4897)
+03/18 21:49:44 INFO train_distill_dimo.py:745]     Train loss_pg: 1.1557 (-1.3311)
+03/18 21:49:44 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2812 (-9.5106)
+03/18 21:49:44 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4668 (8.4659)
+03/18 21:49:44 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.7400)
+03/18 21:49:44 INFO train_distill_dimo.py:667] < PROGRESS: 15.01% | SPEED: 2.903s / step | ETA: 6:51:18 >
+03/18 21:52:10 INFO train_distill_dimo.py:734] Iteration 1550, lr_s=9.73e-06 lr_a=9.73e-06, time=3.24s
+03/18 21:52:10 INFO train_distill_dimo.py:745]     Train H_mean: 9.2188 (9.3313)
+03/18 21:52:10 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2799 (-0.2823)
+03/18 21:52:10 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3972 (0.4500)
+03/18 21:52:10 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1958 (0.2277)
+03/18 21:52:10 INFO train_distill_dimo.py:745]     Train loss_pg: 1.6930 (1.3308)
+03/18 21:52:10 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.3125 (-9.4750)
+03/18 21:52:10 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4728 (8.4670)
+03/18 21:52:10 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.8400)
+03/18 21:54:35 INFO train_distill_dimo.py:734] Iteration 1600, lr_s=9.71e-06 lr_a=9.71e-06, time=2.74s
+03/18 21:54:35 INFO train_distill_dimo.py:745]     Train H_mean: 8.7188 (6.8561)
+03/18 21:54:35 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2286 (-0.2295)
+03/18 21:54:35 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3854 (0.5980)
+03/18 21:54:35 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2125 (0.2862)
+03/18 21:54:35 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2454 (0.3418)
+03/18 21:54:35 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.8125 (-7.3058)
+03/18 21:54:35 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4640 (8.4681)
+03/18 21:54:35 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.7200)
+03/18 21:57:00 INFO train_distill_dimo.py:734] Iteration 1650, lr_s=9.68e-06 lr_a=9.68e-06, time=2.72s
+03/18 21:57:00 INFO train_distill_dimo.py:745]     Train H_mean: 4.2656 (5.0978)
+03/18 21:57:00 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2179 (-0.2176)
+03/18 21:57:00 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3500 (0.5060)
+03/18 21:57:00 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2579 (0.3394)
+03/18 21:57:00 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3122 (-0.4377)
+03/18 21:57:00 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -4.4688 (-5.2787)
+03/18 21:57:00 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4686 (8.4615)
+03/18 21:57:00 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.8200)
+03/18 21:59:27 INFO train_distill_dimo.py:734] Iteration 1700, lr_s=9.65e-06 lr_a=9.65e-06, time=2.75s
+03/18 21:59:27 INFO train_distill_dimo.py:745]     Train H_mean: 5.4844 (5.6181)
+03/18 21:59:27 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2292 (-0.2292)
+03/18 21:59:27 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3846 (0.4734)
+03/18 21:59:27 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2160 (0.2985)
+03/18 21:59:27 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4494 (-0.0791)
+03/18 21:59:27 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -5.6406 (-5.6847)
+03/18 21:59:27 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4696 (8.4694)
+03/18 21:59:27 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.8400)
+03/18 22:01:52 INFO train_distill_dimo.py:734] Iteration 1750, lr_s=9.62e-06 lr_a=9.62e-06, time=3.13s
+03/18 22:01:52 INFO train_distill_dimo.py:745]     Train H_mean: 6.2969 (6.4930)
+03/18 22:01:52 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2196 (-0.2212)
+03/18 22:01:52 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3568 (0.3781)
+03/18 22:01:52 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2145 (0.2616)
+03/18 22:01:52 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2618 (0.2148)
+03/18 22:01:52 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -6.3281 (-6.4616)
+03/18 22:01:52 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4721 (8.4718)
+03/18 22:01:52 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.8400)
+03/18 22:04:17 INFO train_distill_dimo.py:734] Iteration 1800, lr_s=9.59e-06 lr_a=9.59e-06, time=3.66s
+03/18 22:04:17 INFO train_distill_dimo.py:745]     Train H_mean: 10.2500 (9.9094)
+03/18 22:04:17 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2162 (-0.2166)
+03/18 22:04:17 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3939 (0.5356)
+03/18 22:04:17 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2621 (0.3059)
+03/18 22:04:17 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5398 (0.2182)
+03/18 22:04:17 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2812 (-9.9375)
+03/18 22:04:17 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4738 (8.4753)
+03/18 22:04:17 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.8600)
+03/18 22:06:41 INFO train_distill_dimo.py:734] Iteration 1850, lr_s=9.56e-06 lr_a=9.56e-06, time=2.71s
+03/18 22:06:41 INFO train_distill_dimo.py:745]     Train H_mean: 9.6562 (9.0672)
+03/18 22:06:41 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2747 (-0.2583)
+03/18 22:06:41 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3141 (0.4207)
+03/18 22:06:41 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2255 (0.4467)
+03/18 22:06:41 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8500 (-0.7939)
+03/18 22:06:41 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.6250 (-9.0931)
+03/18 22:06:41 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4712 (8.4740)
+03/18 22:06:41 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.8800)
+03/18 22:09:07 INFO train_distill_dimo.py:734] Iteration 1900, lr_s=9.53e-06 lr_a=9.53e-06, time=2.86s
+03/18 22:09:07 INFO train_distill_dimo.py:745]     Train H_mean: 4.9375 (5.4769)
+03/18 22:09:07 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.4143 (-0.4122)
+03/18 22:09:07 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3465 (0.3980)
+03/18 22:09:07 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2774 (0.8209)
+03/18 22:09:07 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8865 (-2.1335)
+03/18 22:09:07 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -4.8438 (-5.5341)
+03/18 22:09:07 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4674 (8.4668)
+03/18 22:09:07 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.9200)
+03/18 22:11:33 INFO train_distill_dimo.py:734] Iteration 1950, lr_s=9.49e-06 lr_a=9.49e-06, time=2.71s
+03/18 22:11:33 INFO train_distill_dimo.py:745]     Train H_mean: 10.2500 (9.4686)
+03/18 22:11:33 INFO train_distill_dimo.py:745]     Train baseline_ema: -1.4293 (-1.3772)
+03/18 22:11:33 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.4385 (0.5928)
+03/18 22:11:33 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 4.2764 (4.5101)
+03/18 22:11:33 INFO train_distill_dimo.py:745]     Train loss_pg: -31.3397 (-31.6821)
+03/18 22:11:33 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2812 (-9.9281)
+03/18 22:11:33 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.3464 (8.2270)
+03/18 22:11:33 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (0.9800)
+03/18 22:13:59 INFO train_distill_dimo.py:734] Iteration 2000, lr_s=9.46e-06 lr_a=9.46e-06, time=2.70s
+03/18 22:13:59 INFO train_distill_dimo.py:745]     Train H_mean: 9.5000 (7.7692)
+03/18 22:13:59 INFO train_distill_dimo.py:745]     Train baseline_ema: -2.7100 (-2.6685)
+03/18 22:13:59 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3827 (0.5217)
+03/18 22:13:59 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 3.6942 (3.7753)
+03/18 22:13:59 INFO train_distill_dimo.py:745]     Train loss_pg: -9.4110 (-14.7203)
+03/18 22:13:59 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9375 (-8.0806)
+03/18 22:13:59 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4310 (8.2723)
+03/18 22:13:59 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:13:59 INFO train_distill_dimo.py:667] < PROGRESS: 20.01% | SPEED: 2.905s / step | ETA: 6:27:16 >
+03/18 22:14:18 INFO train_distill_dimo.py:720] [save] step=2000 → ./experiments/distill_dimo/checkpoints/checkpoint-2000
+03/18 22:16:43 INFO train_distill_dimo.py:734] Iteration 2050, lr_s=9.42e-06 lr_a=9.42e-06, time=3.12s
+03/18 22:16:43 INFO train_distill_dimo.py:745]     Train H_mean: 6.0000 (5.7856)
+03/18 22:16:43 INFO train_distill_dimo.py:745]     Train baseline_ema: -2.1114 (-2.1247)
+03/18 22:16:43 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2753 (0.5141)
+03/18 22:16:43 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.3491 (0.4919)
+03/18 22:16:43 INFO train_distill_dimo.py:745]     Train loss_pg: 10.3677 (10.1826)
+03/18 22:16:43 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -5.8750 (-5.8262)
+03/18 22:16:43 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4587 (8.4579)
+03/18 22:16:43 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:19:08 INFO train_distill_dimo.py:734] Iteration 2100, lr_s=9.39e-06 lr_a=9.39e-06, time=3.11s
+03/18 22:19:08 INFO train_distill_dimo.py:745]     Train H_mean: 7.7500 (7.4414)
+03/18 22:19:08 INFO train_distill_dimo.py:745]     Train baseline_ema: -1.3610 (-1.3717)
+03/18 22:19:08 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2584 (0.3733)
+03/18 22:19:08 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2086 (0.2506)
+03/18 22:19:08 INFO train_distill_dimo.py:745]     Train loss_pg: 9.3317 (8.9939)
+03/18 22:19:08 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -7.6875 (-7.4872)
+03/18 22:19:08 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4658 (8.4648)
+03/18 22:19:08 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:21:33 INFO train_distill_dimo.py:734] Iteration 2150, lr_s=9.35e-06 lr_a=9.35e-06, time=2.69s
+03/18 22:21:33 INFO train_distill_dimo.py:745]     Train H_mean: 10.0938 (9.2992)
+03/18 22:21:33 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.8914 (-0.8980)
+03/18 22:21:33 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2794 (0.3336)
+03/18 22:21:33 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1999 (0.2434)
+03/18 22:21:33 INFO train_distill_dimo.py:745]     Train loss_pg: 6.9837 (6.7240)
+03/18 22:21:33 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0938 (-9.3644)
+03/18 22:21:33 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4766 (8.4760)
+03/18 22:21:33 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:23:57 INFO train_distill_dimo.py:734] Iteration 2200, lr_s=9.31e-06 lr_a=9.31e-06, time=2.72s
+03/18 22:23:57 INFO train_distill_dimo.py:745]     Train H_mean: 10.4375 (9.8500)
+03/18 22:23:57 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.6150 (-0.6255)
+03/18 22:23:57 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3032 (0.3837)
+03/18 22:23:57 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1918 (0.3006)
+03/18 22:23:57 INFO train_distill_dimo.py:745]     Train loss_pg: 4.5032 (3.8291)
+03/18 22:23:57 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5312 (-9.9563)
+03/18 22:23:57 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4798 (8.4796)
+03/18 22:23:57 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:26:24 INFO train_distill_dimo.py:734] Iteration 2250, lr_s=9.27e-06 lr_a=9.27e-06, time=2.76s
+03/18 22:26:24 INFO train_distill_dimo.py:745]     Train H_mean: 10.5000 (9.6075)
+03/18 22:26:24 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.4684 (-0.4696)
+03/18 22:26:24 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2820 (0.4236)
+03/18 22:26:24 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1805 (0.2770)
+03/18 22:26:24 INFO train_distill_dimo.py:745]     Train loss_pg: 2.9918 (2.4826)
+03/18 22:26:24 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5625 (-9.6867)
+03/18 22:26:24 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4804 (8.4775)
+03/18 22:26:24 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:28:48 INFO train_distill_dimo.py:734] Iteration 2300, lr_s=9.23e-06 lr_a=9.23e-06, time=2.74s
+03/18 22:28:48 INFO train_distill_dimo.py:745]     Train H_mean: 9.3750 (9.0337)
+03/18 22:28:48 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.3688 (-0.3708)
+03/18 22:28:48 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2909 (0.4026)
+03/18 22:28:48 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1908 (0.2867)
+03/18 22:28:48 INFO train_distill_dimo.py:745]     Train loss_pg: 1.5625 (0.9734)
+03/18 22:28:48 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.4688 (-9.1769)
+03/18 22:28:48 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4789 (8.4761)
+03/18 22:28:48 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:31:12 INFO train_distill_dimo.py:734] Iteration 2350, lr_s=9.18e-06 lr_a=9.18e-06, time=3.64s
+03/18 22:31:12 INFO train_distill_dimo.py:745]     Train H_mean: 9.3438 (8.6280)
+03/18 22:31:12 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.3228 (-0.3220)
+03/18 22:31:12 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3102 (0.3256)
+03/18 22:31:12 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1753 (0.2086)
+03/18 22:31:12 INFO train_distill_dimo.py:745]     Train loss_pg: 1.4175 (1.0618)
+03/18 22:31:12 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.4062 (-8.6775)
+03/18 22:31:12 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4777 (8.4762)
+03/18 22:31:12 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:33:36 INFO train_distill_dimo.py:734] Iteration 2400, lr_s=9.14e-06 lr_a=9.14e-06, time=2.73s
+03/18 22:33:36 INFO train_distill_dimo.py:745]     Train H_mean: 9.8125 (9.4941)
+03/18 22:33:36 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2801 (-0.2778)
+03/18 22:33:36 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2949 (0.4010)
+03/18 22:33:36 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2045 (0.2991)
+03/18 22:33:36 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8711 (0.1309)
+03/18 22:33:36 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9062 (-9.5577)
+03/18 22:33:36 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4799 (8.4770)
+03/18 22:33:36 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:36:01 INFO train_distill_dimo.py:734] Iteration 2450, lr_s=9.10e-06 lr_a=9.10e-06, time=2.71s
+03/18 22:36:01 INFO train_distill_dimo.py:745]     Train H_mean: 9.7500 (9.1181)
+03/18 22:36:01 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2560 (-0.2596)
+03/18 22:36:01 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3204 (0.4369)
+03/18 22:36:01 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2151 (0.2650)
+03/18 22:36:01 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8487 (0.4176)
+03/18 22:36:01 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.7500 (-9.1594)
+03/18 22:36:01 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4775 (8.4769)
+03/18 22:36:01 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:38:25 INFO train_distill_dimo.py:734] Iteration 2500, lr_s=9.05e-06 lr_a=9.05e-06, time=2.74s
+03/18 22:38:25 INFO train_distill_dimo.py:745]     Train H_mean: 8.2188 (7.8259)
+03/18 22:38:25 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2478 (-0.2472)
+03/18 22:38:25 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2801 (0.3614)
+03/18 22:38:25 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1842 (0.2305)
+03/18 22:38:25 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6970 (0.3152)
+03/18 22:38:25 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.2188 (-7.9112)
+03/18 22:38:25 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4715 (8.4727)
+03/18 22:38:25 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:38:25 INFO train_distill_dimo.py:667] < PROGRESS: 25.01% | SPEED: 2.903s / step | ETA: 6:02:48 >
+03/18 22:40:50 INFO train_distill_dimo.py:734] Iteration 2550, lr_s=9.01e-06 lr_a=9.01e-06, time=2.77s
+03/18 22:40:50 INFO train_distill_dimo.py:745]     Train H_mean: 9.1875 (8.6328)
+03/18 22:40:50 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2616 (-0.2539)
+03/18 22:40:50 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2581 (0.2848)
+03/18 22:40:50 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1875 (0.3315)
+03/18 22:40:50 INFO train_distill_dimo.py:745]     Train loss_pg: 0.9756 (-0.6597)
+03/18 22:40:50 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.4375 (-8.8306)
+03/18 22:40:50 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4717 (8.4727)
+03/18 22:40:50 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:43:15 INFO train_distill_dimo.py:734] Iteration 2600, lr_s=8.96e-06 lr_a=8.96e-06, time=3.14s
+03/18 22:43:15 INFO train_distill_dimo.py:745]     Train H_mean: 8.7500 (8.4072)
+03/18 22:43:15 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2467 (-0.2479)
+03/18 22:43:15 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3390 (0.5306)
+03/18 22:43:15 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2181 (0.2972)
+03/18 22:43:15 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8424 (0.4060)
+03/18 22:43:15 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.5938 (-8.4262)
+03/18 22:43:15 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4660 (8.4676)
+03/18 22:43:15 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:45:40 INFO train_distill_dimo.py:734] Iteration 2650, lr_s=8.91e-06 lr_a=8.91e-06, time=3.63s
+03/18 22:45:40 INFO train_distill_dimo.py:745]     Train H_mean: 8.7500 (8.1944)
+03/18 22:45:40 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2355 (-0.2351)
+03/18 22:45:40 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3066 (0.4106)
+03/18 22:45:40 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2174 (0.2821)
+03/18 22:45:40 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3596 (-0.1083)
+03/18 22:45:40 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.7812 (-8.2675)
+03/18 22:45:40 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4665 (8.4613)
+03/18 22:45:40 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:48:04 INFO train_distill_dimo.py:734] Iteration 2700, lr_s=8.86e-06 lr_a=8.86e-06, time=2.73s
+03/18 22:48:04 INFO train_distill_dimo.py:745]     Train H_mean: 5.3125 (5.6939)
+03/18 22:48:04 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.3140 (-0.3148)
+03/18 22:48:04 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3004 (0.5046)
+03/18 22:48:04 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2673 (0.5459)
+03/18 22:48:04 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5878 (-0.6902)
+03/18 22:48:04 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -5.4062 (-5.7880)
+03/18 22:48:04 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4635 (8.4639)
+03/18 22:48:04 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:50:30 INFO train_distill_dimo.py:734] Iteration 2750, lr_s=8.81e-06 lr_a=8.81e-06, time=2.72s
+03/18 22:50:30 INFO train_distill_dimo.py:745]     Train H_mean: 9.0938 (8.7341)
+03/18 22:50:30 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2651 (-0.2681)
+03/18 22:50:30 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3460 (0.4165)
+03/18 22:50:30 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2269 (0.2538)
+03/18 22:50:30 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4285 (0.5541)
+03/18 22:50:30 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.9688 (-8.7355)
+03/18 22:50:30 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4725 (8.4737)
+03/18 22:50:30 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:52:54 INFO train_distill_dimo.py:734] Iteration 2800, lr_s=8.76e-06 lr_a=8.76e-06, time=2.74s
+03/18 22:52:54 INFO train_distill_dimo.py:745]     Train H_mean: 9.5625 (9.1528)
+03/18 22:52:54 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2492 (-0.2463)
+03/18 22:52:54 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2628 (0.4922)
+03/18 22:52:54 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1828 (0.2635)
+03/18 22:52:54 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7066 (0.4807)
+03/18 22:52:54 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.7500 (-9.2741)
+03/18 22:52:54 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4753 (8.4764)
+03/18 22:52:54 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:55:19 INFO train_distill_dimo.py:734] Iteration 2850, lr_s=8.71e-06 lr_a=8.71e-06, time=2.70s
+03/18 22:55:19 INFO train_distill_dimo.py:745]     Train H_mean: 8.8125 (8.8663)
+03/18 22:55:19 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2237 (-0.2233)
+03/18 22:55:19 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3219 (0.3553)
+03/18 22:55:19 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2188 (0.2302)
+03/18 22:55:19 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7536 (0.2922)
+03/18 22:55:19 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -8.7812 (-8.8931)
+03/18 22:55:19 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4709 (8.4739)
+03/18 22:55:19 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 22:57:45 INFO train_distill_dimo.py:734] Iteration 2900, lr_s=8.66e-06 lr_a=8.66e-06, time=4.00s
+03/18 22:57:45 INFO train_distill_dimo.py:745]     Train H_mean: 10.6562 (10.0400)
+03/18 22:57:45 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2013 (-0.2018)
+03/18 22:57:45 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2932 (0.3832)
+03/18 22:57:45 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1731 (0.2200)
+03/18 22:57:45 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7274 (0.2968)
+03/18 22:57:45 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7188 (-10.0550)
+03/18 22:57:45 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4763 (8.4756)
+03/18 22:57:45 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:00:07 INFO train_distill_dimo.py:734] Iteration 2950, lr_s=8.60e-06 lr_a=8.60e-06, time=2.78s
+03/18 23:00:07 INFO train_distill_dimo.py:745]     Train H_mean: 9.6562 (8.9588)
+03/18 23:00:07 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1967 (-0.1959)
+03/18 23:00:07 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2834 (0.5429)
+03/18 23:00:07 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1801 (0.3061)
+03/18 23:00:07 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4340 (-0.0931)
+03/18 23:00:07 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.6562 (-9.2066)
+03/18 23:00:07 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4730 (8.4739)
+03/18 23:00:07 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:02:30 INFO train_distill_dimo.py:734] Iteration 3000, lr_s=8.55e-06 lr_a=8.55e-06, time=2.74s
+03/18 23:02:30 INFO train_distill_dimo.py:745]     Train H_mean: 9.4375 (9.1108)
+03/18 23:02:30 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1900 (-0.1906)
+03/18 23:02:30 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2958 (0.4021)
+03/18 23:02:30 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1759 (0.2189)
+03/18 23:02:30 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5363 (0.2300)
+03/18 23:02:30 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.4688 (-9.1995)
+03/18 23:02:30 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4742 (8.4755)
+03/18 23:02:30 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:02:30 INFO train_distill_dimo.py:667] < PROGRESS: 30.01% | SPEED: 2.900s / step | ETA: 5:38:23 >
+03/18 23:02:50 INFO train_distill_dimo.py:720] [save] step=3000 → ./experiments/distill_dimo/checkpoints/checkpoint-3000
+03/18 23:05:14 INFO train_distill_dimo.py:734] Iteration 3050, lr_s=8.49e-06 lr_a=8.49e-06, time=2.70s
+03/18 23:05:14 INFO train_distill_dimo.py:745]     Train H_mean: 9.7500 (9.4227)
+03/18 23:05:14 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1904 (-0.1894)
+03/18 23:05:14 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3064 (0.3809)
+03/18 23:05:14 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1925 (0.2352)
+03/18 23:05:14 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4732 (-0.0103)
+03/18 23:05:14 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.8750 (-9.4509)
+03/18 23:05:14 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4740 (8.4751)
+03/18 23:05:14 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:07:39 INFO train_distill_dimo.py:734] Iteration 3100, lr_s=8.44e-06 lr_a=8.44e-06, time=2.73s
+03/18 23:07:39 INFO train_distill_dimo.py:745]     Train H_mean: 10.2188 (9.5797)
+03/18 23:07:39 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1796 (-0.1796)
+03/18 23:07:39 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2618 (0.3240)
+03/18 23:07:39 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1650 (0.2030)
+03/18 23:07:39 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3001 (-0.0516)
+03/18 23:07:39 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.3438 (-9.6344)
+03/18 23:07:39 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4777 (8.4766)
+03/18 23:07:39 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:10:04 INFO train_distill_dimo.py:734] Iteration 3150, lr_s=8.38e-06 lr_a=8.38e-06, time=3.16s
+03/18 23:10:04 INFO train_distill_dimo.py:745]     Train H_mean: 9.9688 (9.2291)
+03/18 23:10:04 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1851 (-0.1857)
+03/18 23:10:04 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3277 (0.4222)
+03/18 23:10:04 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1987 (0.2443)
+03/18 23:10:04 INFO train_distill_dimo.py:745]     Train loss_pg: 0.1531 (-0.2127)
+03/18 23:10:04 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0000 (-9.1969)
+03/18 23:10:04 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4751 (8.4751)
+03/18 23:10:04 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:12:28 INFO train_distill_dimo.py:734] Iteration 3200, lr_s=8.32e-06 lr_a=8.32e-06, time=3.55s
+03/18 23:12:28 INFO train_distill_dimo.py:745]     Train H_mean: 9.9375 (9.9700)
+03/18 23:12:28 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1873 (-0.1877)
+03/18 23:12:28 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3112 (0.3620)
+03/18 23:12:28 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1964 (0.2074)
+03/18 23:12:28 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7075 (0.1787)
+03/18 23:12:28 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9688 (-10.0281)
+03/18 23:12:28 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4790 (8.4769)
+03/18 23:12:28 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:14:53 INFO train_distill_dimo.py:734] Iteration 3250, lr_s=8.27e-06 lr_a=8.27e-06, time=2.73s
+03/18 23:14:53 INFO train_distill_dimo.py:745]     Train H_mean: 9.6875 (8.8659)
+03/18 23:14:53 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1925 (-0.1926)
+03/18 23:14:53 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3091 (0.4185)
+03/18 23:14:53 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1851 (0.2267)
+03/18 23:14:53 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2102 (-0.2807)
+03/18 23:14:53 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.7812 (-8.9216)
+03/18 23:14:53 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4743 (8.4754)
+03/18 23:14:53 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:17:18 INFO train_distill_dimo.py:734] Iteration 3300, lr_s=8.21e-06 lr_a=8.21e-06, time=2.74s
+03/18 23:17:18 INFO train_distill_dimo.py:745]     Train H_mean: 10.1562 (9.8791)
+03/18 23:17:18 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1899 (-0.1913)
+03/18 23:17:18 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2916 (0.3763)
+03/18 23:17:18 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1880 (0.2611)
+03/18 23:17:18 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4316 (-0.1211)
+03/18 23:17:18 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2188 (-9.9025)
+03/18 23:17:18 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4742 (8.4761)
+03/18 23:17:18 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:19:42 INFO train_distill_dimo.py:734] Iteration 3350, lr_s=8.15e-06 lr_a=8.15e-06, time=2.71s
+03/18 23:19:42 INFO train_distill_dimo.py:745]     Train H_mean: 9.7500 (9.3331)
+03/18 23:19:42 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1894 (-0.1906)
+03/18 23:19:42 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3297 (0.4318)
+03/18 23:19:42 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1885 (0.2416)
+03/18 23:19:42 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5261 (0.1148)
+03/18 23:19:42 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.7812 (-9.4512)
+03/18 23:19:42 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4750 (8.4756)
+03/18 23:19:42 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:22:07 INFO train_distill_dimo.py:734] Iteration 3400, lr_s=8.09e-06 lr_a=8.09e-06, time=2.71s
+03/18 23:22:07 INFO train_distill_dimo.py:745]     Train H_mean: 9.6250 (9.6569)
+03/18 23:22:07 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1845 (-0.1847)
+03/18 23:22:07 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3282 (0.4758)
+03/18 23:22:07 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1997 (0.2676)
+03/18 23:22:07 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3394 (0.0760)
+03/18 23:22:07 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.7812 (-9.6431)
+03/18 23:22:07 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4788 (8.4768)
+03/18 23:22:07 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:24:32 INFO train_distill_dimo.py:734] Iteration 3450, lr_s=8.02e-06 lr_a=8.02e-06, time=3.62s
+03/18 23:24:32 INFO train_distill_dimo.py:745]     Train H_mean: 10.0000 (9.5691)
+03/18 23:24:32 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1779 (-0.1779)
+03/18 23:24:32 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2875 (0.4175)
+03/18 23:24:32 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1719 (0.2235)
+03/18 23:24:32 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3163 (0.1205)
+03/18 23:24:32 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9688 (-9.5697)
+03/18 23:24:32 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4749 (8.4749)
+03/18 23:24:32 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:26:54 INFO train_distill_dimo.py:734] Iteration 3500, lr_s=7.96e-06 lr_a=7.96e-06, time=2.73s
+03/18 23:26:54 INFO train_distill_dimo.py:745]     Train H_mean: 10.4688 (10.2075)
+03/18 23:26:54 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1735 (-0.1737)
+03/18 23:26:54 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2848 (0.3539)
+03/18 23:26:54 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1755 (0.2134)
+03/18 23:26:54 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4074 (-0.0260)
+03/18 23:26:54 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4688 (-10.2225)
+03/18 23:26:54 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4765 (8.4762)
+03/18 23:26:54 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:26:54 INFO train_distill_dimo.py:667] < PROGRESS: 35.01% | SPEED: 2.899s / step | ETA: 5:14:02 >
+03/18 23:29:20 INFO train_distill_dimo.py:734] Iteration 3550, lr_s=7.90e-06 lr_a=7.90e-06, time=2.75s
+03/18 23:29:20 INFO train_distill_dimo.py:745]     Train H_mean: 10.5625 (10.0569)
+03/18 23:29:20 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1808 (-0.1811)
+03/18 23:29:20 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3658 (0.4729)
+03/18 23:29:20 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2264 (0.2704)
+03/18 23:29:20 INFO train_distill_dimo.py:745]     Train loss_pg: 0.1024 (-0.2856)
+03/18 23:29:20 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5625 (-10.0756)
+03/18 23:29:20 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4792 (8.4764)
+03/18 23:29:20 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:31:46 INFO train_distill_dimo.py:734] Iteration 3600, lr_s=7.84e-06 lr_a=7.84e-06, time=2.72s
+03/18 23:31:46 INFO train_distill_dimo.py:745]     Train H_mean: 10.0000 (9.9378)
+03/18 23:31:46 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1904 (-0.1886)
+03/18 23:31:46 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3021 (0.3466)
+03/18 23:31:46 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1774 (0.2385)
+03/18 23:31:46 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4929 (-0.0604)
+03/18 23:31:46 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0000 (-9.9541)
+03/18 23:31:46 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4737 (8.4758)
+03/18 23:31:46 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:34:12 INFO train_distill_dimo.py:734] Iteration 3650, lr_s=7.77e-06 lr_a=7.77e-06, time=2.70s
+03/18 23:34:12 INFO train_distill_dimo.py:745]     Train H_mean: 9.1875 (8.5920)
+03/18 23:34:12 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1890 (-0.1891)
+03/18 23:34:12 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2916 (0.3681)
+03/18 23:34:12 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1702 (0.2362)
+03/18 23:34:12 INFO train_distill_dimo.py:745]     Train loss_pg: 0.1907 (-0.0997)
+03/18 23:34:12 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.3125 (-8.6975)
+03/18 23:34:12 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4738 (8.4740)
+03/18 23:34:12 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:36:38 INFO train_distill_dimo.py:734] Iteration 3700, lr_s=7.71e-06 lr_a=7.71e-06, time=2.76s
+03/18 23:36:38 INFO train_distill_dimo.py:745]     Train H_mean: 9.6250 (9.0472)
+03/18 23:36:38 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1941 (-0.1950)
+03/18 23:36:38 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2795 (0.3722)
+03/18 23:36:38 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1700 (0.2083)
+03/18 23:36:38 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6598 (0.1691)
+03/18 23:36:38 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.5312 (-8.9913)
+03/18 23:36:38 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4767 (8.4756)
+03/18 23:36:38 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:39:03 INFO train_distill_dimo.py:734] Iteration 3750, lr_s=7.64e-06 lr_a=7.64e-06, time=3.50s
+03/18 23:39:03 INFO train_distill_dimo.py:745]     Train H_mean: 10.0938 (10.1006)
+03/18 23:39:03 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1902 (-0.1906)
+03/18 23:39:03 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3033 (0.4341)
+03/18 23:39:03 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1932 (0.2395)
+03/18 23:39:03 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2330 (0.0750)
+03/18 23:39:03 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1250 (-10.1156)
+03/18 23:39:03 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4767 (8.4767)
+03/18 23:39:03 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:41:28 INFO train_distill_dimo.py:734] Iteration 3800, lr_s=7.58e-06 lr_a=7.58e-06, time=2.72s
+03/18 23:41:28 INFO train_distill_dimo.py:745]     Train H_mean: 10.0000 (9.9672)
+03/18 23:41:28 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1992 (-0.2135)
+03/18 23:41:28 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3212 (0.4034)
+03/18 23:41:28 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1984 (0.3543)
+03/18 23:41:28 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4953 (-1.0337)
+03/18 23:41:28 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9688 (-9.9822)
+03/18 23:41:28 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4774 (8.4760)
+03/18 23:41:28 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:43:55 INFO train_distill_dimo.py:734] Iteration 3850, lr_s=7.51e-06 lr_a=7.51e-06, time=2.73s
+03/18 23:43:55 INFO train_distill_dimo.py:745]     Train H_mean: 9.8438 (9.9675)
+03/18 23:43:55 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2283 (-0.2293)
+03/18 23:43:55 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3755 (0.4256)
+03/18 23:43:55 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1964 (0.2284)
+03/18 23:43:55 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7613 (0.3588)
+03/18 23:43:55 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9375 (-9.9987)
+03/18 23:43:55 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4773 (8.4769)
+03/18 23:43:55 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:46:21 INFO train_distill_dimo.py:734] Iteration 3900, lr_s=7.44e-06 lr_a=7.44e-06, time=2.71s
+03/18 23:46:21 INFO train_distill_dimo.py:745]     Train H_mean: 9.9688 (10.1725)
+03/18 23:46:21 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2140 (-0.2137)
+03/18 23:46:21 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3087 (0.4091)
+03/18 23:46:21 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1594 (0.2164)
+03/18 23:46:21 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6454 (0.3488)
+03/18 23:46:21 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0312 (-10.1837)
+03/18 23:46:21 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4757 (8.4759)
+03/18 23:46:21 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:48:47 INFO train_distill_dimo.py:734] Iteration 3950, lr_s=7.38e-06 lr_a=7.38e-06, time=2.73s
+03/18 23:48:47 INFO train_distill_dimo.py:745]     Train H_mean: 10.5312 (10.1363)
+03/18 23:48:47 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2006 (-0.2004)
+03/18 23:48:47 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3402 (0.4469)
+03/18 23:48:47 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1869 (0.2515)
+03/18 23:48:47 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4139 (-0.0178)
+03/18 23:48:47 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5625 (-10.1587)
+03/18 23:48:47 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4787 (8.4763)
+03/18 23:48:47 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:51:13 INFO train_distill_dimo.py:734] Iteration 4000, lr_s=7.31e-06 lr_a=7.31e-06, time=3.10s
+03/18 23:51:13 INFO train_distill_dimo.py:745]     Train H_mean: 9.8438 (9.8100)
+03/18 23:51:13 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1974 (-0.1954)
+03/18 23:51:13 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2790 (0.3901)
+03/18 23:51:13 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1794 (0.2100)
+03/18 23:51:13 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5218 (0.2055)
+03/18 23:51:13 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.8125 (-9.8091)
+03/18 23:51:13 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4765 (8.4767)
+03/18 23:51:13 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:51:13 INFO train_distill_dimo.py:667] < PROGRESS: 40.01% | SPEED: 2.901s / step | ETA: 4:50:06 >
+03/18 23:51:31 INFO train_distill_dimo.py:720] [save] step=4000 → ./experiments/distill_dimo/checkpoints/checkpoint-4000
+03/18 23:53:57 INFO train_distill_dimo.py:734] Iteration 4050, lr_s=7.24e-06 lr_a=7.24e-06, time=2.71s
+03/18 23:53:57 INFO train_distill_dimo.py:745]     Train H_mean: 9.3125 (9.3031)
+03/18 23:53:57 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2030 (-0.1996)
+03/18 23:53:57 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3051 (0.3405)
+03/18 23:53:57 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1755 (0.2413)
+03/18 23:53:57 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6019 (-0.2030)
+03/18 23:53:57 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.2812 (-9.3353)
+03/18 23:53:57 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4754 (8.4754)
+03/18 23:53:57 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:56:23 INFO train_distill_dimo.py:734] Iteration 4100, lr_s=7.17e-06 lr_a=7.17e-06, time=2.74s
+03/18 23:56:23 INFO train_distill_dimo.py:745]     Train H_mean: 9.4375 (9.2091)
+03/18 23:56:23 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2001 (-0.2006)
+03/18 23:56:23 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2864 (0.3571)
+03/18 23:56:23 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1636 (0.2128)
+03/18 23:56:23 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4187 (-0.0133)
+03/18 23:56:23 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.5000 (-9.2131)
+03/18 23:56:23 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4759 (8.4767)
+03/18 23:56:23 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/18 23:58:48 INFO train_distill_dimo.py:734] Iteration 4150, lr_s=7.10e-06 lr_a=7.10e-06, time=2.71s
+03/18 23:58:48 INFO train_distill_dimo.py:745]     Train H_mean: 10.4062 (10.0175)
+03/18 23:58:48 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2037 (-0.2047)
+03/18 23:58:48 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2537 (0.3975)
+03/18 23:58:48 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1642 (0.2176)
+03/18 23:58:48 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8601 (-0.0270)
+03/18 23:58:48 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4062 (-10.0400)
+03/18 23:58:48 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4784 (8.4765)
+03/18 23:58:48 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:01:16 INFO train_distill_dimo.py:734] Iteration 4200, lr_s=7.03e-06 lr_a=7.03e-06, time=2.74s
+03/19 00:01:16 INFO train_distill_dimo.py:745]     Train H_mean: 9.4688 (8.7209)
+03/19 00:01:16 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2011 (-0.2006)
+03/19 00:01:16 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3295 (0.3806)
+03/19 00:01:16 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1766 (0.2176)
+03/19 00:01:16 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3896 (0.1027)
+03/19 00:01:16 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.5312 (-8.8367)
+03/19 00:01:16 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4707 (8.4732)
+03/19 00:01:16 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:03:42 INFO train_distill_dimo.py:734] Iteration 4250, lr_s=6.96e-06 lr_a=6.96e-06, time=3.15s
+03/19 00:03:42 INFO train_distill_dimo.py:745]     Train H_mean: 10.1250 (10.0600)
+03/19 00:03:42 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2023 (-0.2025)
+03/19 00:03:42 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3293 (0.5051)
+03/19 00:03:42 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2070 (0.2731)
+03/19 00:03:42 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3981 (-0.0634)
+03/19 00:03:42 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2812 (-10.0556)
+03/19 00:03:42 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4775 (8.4771)
+03/19 00:03:42 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:06:07 INFO train_distill_dimo.py:734] Iteration 4300, lr_s=6.89e-06 lr_a=6.89e-06, time=3.59s
+03/19 00:06:07 INFO train_distill_dimo.py:745]     Train H_mean: 9.7812 (9.6366)
+03/19 00:06:07 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1939 (-0.1947)
+03/19 00:06:07 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2664 (0.3359)
+03/19 00:06:07 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1729 (0.1892)
+03/19 00:06:07 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4284 (0.2210)
+03/19 00:06:07 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.7500 (-9.6750)
+03/19 00:06:07 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4780 (8.4776)
+03/19 00:06:07 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:08:33 INFO train_distill_dimo.py:734] Iteration 4350, lr_s=6.82e-06 lr_a=6.82e-06, time=2.72s
+03/19 00:08:33 INFO train_distill_dimo.py:745]     Train H_mean: 9.7500 (9.7519)
+03/19 00:08:33 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2154 (-0.2180)
+03/19 00:08:33 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2918 (0.3540)
+03/19 00:08:33 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1684 (0.3481)
+03/19 00:08:33 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7949 (-1.0005)
+03/19 00:08:33 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.7812 (-9.7534)
+03/19 00:08:33 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4767 (8.4766)
+03/19 00:08:33 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:11:01 INFO train_distill_dimo.py:734] Iteration 4400, lr_s=6.75e-06 lr_a=6.75e-06, time=2.72s
+03/19 00:11:01 INFO train_distill_dimo.py:745]     Train H_mean: 10.0938 (10.1200)
+03/19 00:11:01 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2329 (-0.2313)
+03/19 00:11:01 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3272 (0.3653)
+03/19 00:11:01 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1879 (0.2160)
+03/19 00:11:01 INFO train_distill_dimo.py:745]     Train loss_pg: 1.0763 (0.3868)
+03/19 00:11:01 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1562 (-10.1444)
+03/19 00:11:01 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4765 (8.4764)
+03/19 00:11:01 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:13:27 INFO train_distill_dimo.py:734] Iteration 4450, lr_s=6.68e-06 lr_a=6.68e-06, time=2.75s
+03/19 00:13:27 INFO train_distill_dimo.py:745]     Train H_mean: 9.8750 (9.6225)
+03/19 00:13:27 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2186 (-0.2172)
+03/19 00:13:27 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3084 (0.3726)
+03/19 00:13:27 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1902 (0.2091)
+03/19 00:13:27 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4353 (0.2027)
+03/19 00:13:27 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9062 (-9.6188)
+03/19 00:13:27 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4761 (8.4764)
+03/19 00:13:27 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:15:53 INFO train_distill_dimo.py:734] Iteration 4500, lr_s=6.61e-06 lr_a=6.61e-06, time=2.71s
+03/19 00:15:53 INFO train_distill_dimo.py:745]     Train H_mean: 10.1562 (10.0181)
+03/19 00:15:53 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2019 (-0.2015)
+03/19 00:15:53 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3287 (0.3831)
+03/19 00:15:53 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2021 (0.2185)
+03/19 00:15:53 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5475 (0.1254)
+03/19 00:15:53 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1562 (-10.0213)
+03/19 00:15:53 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4778 (8.4777)
+03/19 00:15:53 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:15:53 INFO train_distill_dimo.py:667] < PROGRESS: 45.01% | SPEED: 2.904s / step | ETA: 4:26:09 >
+03/19 00:18:20 INFO train_distill_dimo.py:734] Iteration 4550, lr_s=6.53e-06 lr_a=6.53e-06, time=3.63s
+03/19 00:18:20 INFO train_distill_dimo.py:745]     Train H_mean: 9.8125 (9.5264)
+03/19 00:18:20 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1964 (-0.1971)
+03/19 00:18:20 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3361 (0.3883)
+03/19 00:18:20 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1865 (0.2320)
+03/19 00:18:20 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3992 (0.0072)
+03/19 00:18:20 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.8438 (-9.5523)
+03/19 00:18:20 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4763 (8.4755)
+03/19 00:18:20 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:20:44 INFO train_distill_dimo.py:734] Iteration 4600, lr_s=6.46e-06 lr_a=6.46e-06, time=3.15s
+03/19 00:20:45 INFO train_distill_dimo.py:745]     Train H_mean: 9.8125 (9.4109)
+03/19 00:20:45 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2028 (-0.2021)
+03/19 00:20:45 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3600 (0.4958)
+03/19 00:20:45 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2209 (0.2449)
+03/19 00:20:45 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2841 (-0.1524)
+03/19 00:20:45 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.8438 (-9.4369)
+03/19 00:20:45 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4729 (8.4742)
+03/19 00:20:45 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:23:11 INFO train_distill_dimo.py:734] Iteration 4650, lr_s=6.39e-06 lr_a=6.39e-06, time=2.73s
+03/19 00:23:11 INFO train_distill_dimo.py:745]     Train H_mean: 10.1250 (9.9128)
+03/19 00:23:11 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1892 (-0.1897)
+03/19 00:23:11 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2566 (0.3070)
+03/19 00:23:11 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1483 (0.1735)
+03/19 00:23:11 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5787 (0.3799)
+03/19 00:23:11 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4375 (-9.9584)
+03/19 00:23:11 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4768 (8.4775)
+03/19 00:23:11 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:25:37 INFO train_distill_dimo.py:734] Iteration 4700, lr_s=6.32e-06 lr_a=6.32e-06, time=2.72s
+03/19 00:25:37 INFO train_distill_dimo.py:745]     Train H_mean: 10.0938 (9.8644)
+03/19 00:25:37 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1771 (-0.1887)
+03/19 00:25:37 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2910 (0.4274)
+03/19 00:25:37 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1866 (0.3097)
+03/19 00:25:37 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3249 (-0.9380)
+03/19 00:25:37 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1250 (-9.8631)
+03/19 00:25:37 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4763 (8.4773)
+03/19 00:25:37 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:28:03 INFO train_distill_dimo.py:734] Iteration 4750, lr_s=6.24e-06 lr_a=6.24e-06, time=2.74s
+03/19 00:28:03 INFO train_distill_dimo.py:745]     Train H_mean: 10.1562 (10.0981)
+03/19 00:28:03 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2115 (-0.2101)
+03/19 00:28:03 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3259 (0.3492)
+03/19 00:28:03 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1725 (0.1933)
+03/19 00:28:03 INFO train_distill_dimo.py:745]     Train loss_pg: 0.9903 (0.4903)
+03/19 00:28:03 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1875 (-10.1275)
+03/19 00:28:03 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4769 (8.4764)
+03/19 00:28:03 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:30:30 INFO train_distill_dimo.py:734] Iteration 4800, lr_s=6.17e-06 lr_a=6.17e-06, time=3.19s
+03/19 00:30:30 INFO train_distill_dimo.py:745]     Train H_mean: 9.7500 (9.9316)
+03/19 00:30:30 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1912 (-0.1930)
+03/19 00:30:30 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3122 (0.4534)
+03/19 00:30:30 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1583 (0.2355)
+03/19 00:30:30 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5349 (0.1364)
+03/19 00:30:30 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9375 (-9.9678)
+03/19 00:30:30 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4772 (8.4766)
+03/19 00:30:30 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:32:56 INFO train_distill_dimo.py:734] Iteration 4850, lr_s=6.09e-06 lr_a=6.09e-06, time=3.58s
+03/19 00:32:56 INFO train_distill_dimo.py:745]     Train H_mean: 9.5312 (8.7112)
+03/19 00:32:56 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1851 (-0.1849)
+03/19 00:32:56 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3154 (0.4106)
+03/19 00:32:56 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1812 (0.2248)
+03/19 00:32:56 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2127 (-0.0435)
+03/19 00:32:56 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.5312 (-8.6884)
+03/19 00:32:56 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4712 (8.4731)
+03/19 00:32:56 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:35:21 INFO train_distill_dimo.py:734] Iteration 4900, lr_s=6.02e-06 lr_a=6.02e-06, time=2.72s
+03/19 00:35:21 INFO train_distill_dimo.py:745]     Train H_mean: 10.2188 (10.0416)
+03/19 00:35:21 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1834 (-0.1823)
+03/19 00:35:21 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3025 (0.3238)
+03/19 00:35:21 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1578 (0.1796)
+03/19 00:35:21 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4026 (0.2414)
+03/19 00:35:21 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1875 (-10.0261)
+03/19 00:35:21 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4783 (8.4769)
+03/19 00:35:21 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:37:47 INFO train_distill_dimo.py:734] Iteration 4950, lr_s=5.95e-06 lr_a=5.95e-06, time=2.73s
+03/19 00:37:47 INFO train_distill_dimo.py:745]     Train H_mean: 10.7500 (10.2269)
+03/19 00:37:47 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1791 (-0.1792)
+03/19 00:37:47 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3281 (0.3815)
+03/19 00:37:47 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1815 (0.2173)
+03/19 00:37:47 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4701 (-0.1853)
+03/19 00:37:47 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7812 (-10.2406)
+03/19 00:37:47 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4791 (8.4776)
+03/19 00:37:47 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:40:14 INFO train_distill_dimo.py:734] Iteration 5000, lr_s=5.87e-06 lr_a=5.87e-06, time=2.81s
+03/19 00:40:14 INFO train_distill_dimo.py:745]     Train H_mean: 9.9375 (9.7134)
+03/19 00:40:14 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1857 (-0.1853)
+03/19 00:40:14 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3159 (0.3631)
+03/19 00:40:14 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.2010 (0.2141)
+03/19 00:40:14 INFO train_distill_dimo.py:745]     Train loss_pg: 0.1092 (-0.2035)
+03/19 00:40:14 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9062 (-9.7059)
+03/19 00:40:14 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4754 (8.4765)
+03/19 00:40:14 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:40:14 INFO train_distill_dimo.py:667] < PROGRESS: 50.01% | SPEED: 2.905s / step | ETA: 4:02:06 >
+03/19 00:40:32 INFO train_distill_dimo.py:720] [save] step=5000 → ./experiments/distill_dimo/checkpoints/checkpoint-5000
+03/19 00:42:59 INFO train_distill_dimo.py:734] Iteration 5050, lr_s=5.80e-06 lr_a=5.80e-06, time=2.72s
+03/19 00:42:59 INFO train_distill_dimo.py:745]     Train H_mean: 9.8125 (9.7594)
+03/19 00:42:59 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1866 (-0.1885)
+03/19 00:42:59 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2696 (0.3853)
+03/19 00:42:59 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1653 (0.3084)
+03/19 00:42:59 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7684 (-0.6864)
+03/19 00:42:59 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.8125 (-9.8903)
+03/19 00:42:59 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4765 (8.4757)
+03/19 00:42:59 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:45:26 INFO train_distill_dimo.py:734] Iteration 5100, lr_s=5.72e-06 lr_a=5.72e-06, time=3.19s
+03/19 00:45:26 INFO train_distill_dimo.py:745]     Train H_mean: 10.0000 (9.9731)
+03/19 00:45:26 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2293 (-0.2278)
+03/19 00:45:26 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2960 (0.4313)
+03/19 00:45:26 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1802 (0.2382)
+03/19 00:45:26 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5400 (0.1313)
+03/19 00:45:26 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0312 (-9.9656)
+03/19 00:45:26 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4763 (8.4762)
+03/19 00:45:26 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:47:50 INFO train_distill_dimo.py:734] Iteration 5150, lr_s=5.65e-06 lr_a=5.65e-06, time=3.12s
+03/19 00:47:50 INFO train_distill_dimo.py:745]     Train H_mean: 10.5625 (10.2375)
+03/19 00:47:50 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2065 (-0.2068)
+03/19 00:47:50 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2603 (0.3198)
+03/19 00:47:50 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1551 (0.1865)
+03/19 00:47:50 INFO train_distill_dimo.py:745]     Train loss_pg: 1.0162 (0.4504)
+03/19 00:47:50 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5000 (-10.2462)
+03/19 00:47:50 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4754 (8.4768)
+03/19 00:47:50 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:50:16 INFO train_distill_dimo.py:734] Iteration 5200, lr_s=5.58e-06 lr_a=5.58e-06, time=2.72s
+03/19 00:50:16 INFO train_distill_dimo.py:745]     Train H_mean: 10.4688 (10.0366)
+03/19 00:50:16 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1989 (-0.1993)
+03/19 00:50:16 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3164 (0.3862)
+03/19 00:50:16 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1757 (0.2151)
+03/19 00:50:16 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4993 (0.0207)
+03/19 00:50:16 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4375 (-10.0275)
+03/19 00:50:16 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4751 (8.4764)
+03/19 00:50:16 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:52:42 INFO train_distill_dimo.py:734] Iteration 5250, lr_s=5.50e-06 lr_a=5.50e-06, time=2.77s
+03/19 00:52:42 INFO train_distill_dimo.py:745]     Train H_mean: 10.3750 (10.1556)
+03/19 00:52:42 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1872 (-0.1859)
+03/19 00:52:42 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2895 (0.3713)
+03/19 00:52:42 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1700 (0.2050)
+03/19 00:52:42 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5072 (0.1842)
+03/19 00:52:42 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0000 (-10.1375)
+03/19 00:52:42 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4764 (8.4770)
+03/19 00:52:42 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:55:10 INFO train_distill_dimo.py:734] Iteration 5300, lr_s=5.43e-06 lr_a=5.43e-06, time=2.72s
+03/19 00:55:10 INFO train_distill_dimo.py:745]     Train H_mean: 10.0312 (10.1538)
+03/19 00:55:10 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1765 (-0.1765)
+03/19 00:55:10 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2428 (0.4006)
+03/19 00:55:10 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1519 (0.2263)
+03/19 00:55:10 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6162 (0.1703)
+03/19 00:55:10 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0000 (-10.1725)
+03/19 00:55:10 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4770 (8.4779)
+03/19 00:55:10 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 00:57:36 INFO train_distill_dimo.py:734] Iteration 5350, lr_s=5.35e-06 lr_a=5.35e-06, time=2.74s
+03/19 00:57:36 INFO train_distill_dimo.py:745]     Train H_mean: 10.1562 (10.1075)
+03/19 00:57:36 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1759 (-0.1765)
+03/19 00:57:36 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3256 (0.4100)
+03/19 00:57:36 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1848 (0.2171)
+03/19 00:57:36 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4530 (-0.4036)
+03/19 00:57:36 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.3438 (-10.1200)
+03/19 00:57:36 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4777 (8.4767)
+03/19 00:57:36 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:00:01 INFO train_distill_dimo.py:734] Iteration 5400, lr_s=5.28e-06 lr_a=5.28e-06, time=3.17s
+03/19 01:00:01 INFO train_distill_dimo.py:745]     Train H_mean: 9.9688 (9.9413)
+03/19 01:00:01 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1866 (-0.1899)
+03/19 01:00:01 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2463 (0.2938)
+03/19 01:00:01 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1470 (0.2567)
+03/19 01:00:01 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5368 (-0.3475)
+03/19 01:00:01 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9688 (-9.9619)
+03/19 01:00:01 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4756 (8.4765)
+03/19 01:00:01 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:02:27 INFO train_distill_dimo.py:734] Iteration 5450, lr_s=5.20e-06 lr_a=5.20e-06, time=2.72s
+03/19 01:02:27 INFO train_distill_dimo.py:745]     Train H_mean: 9.7188 (9.1111)
+03/19 01:02:27 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2122 (-0.2140)
+03/19 01:02:27 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3125 (0.4591)
+03/19 01:02:27 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1535 (0.2327)
+03/19 01:02:27 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2852 (-0.1454)
+03/19 01:02:27 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.7812 (-9.2241)
+03/19 01:02:27 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4743 (8.4750)
+03/19 01:02:27 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:04:55 INFO train_distill_dimo.py:734] Iteration 5500, lr_s=5.13e-06 lr_a=5.13e-06, time=2.72s
+03/19 01:04:55 INFO train_distill_dimo.py:745]     Train H_mean: 10.1875 (9.9128)
+03/19 01:04:55 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2191 (-0.2186)
+03/19 01:04:55 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2600 (0.3227)
+03/19 01:04:55 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1654 (0.2111)
+03/19 01:04:55 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5285 (0.0561)
+03/19 01:04:55 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.3438 (-9.9450)
+03/19 01:04:55 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4752 (8.4763)
+03/19 01:04:55 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:04:55 INFO train_distill_dimo.py:667] < PROGRESS: 55.01% | SPEED: 2.907s / step | ETA: 3:38:01 >
+03/19 01:07:22 INFO train_distill_dimo.py:734] Iteration 5550, lr_s=5.06e-06 lr_a=5.06e-06, time=2.83s
+03/19 01:07:22 INFO train_distill_dimo.py:745]     Train H_mean: 10.1562 (9.9944)
+03/19 01:07:22 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2033 (-0.2034)
+03/19 01:07:22 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2778 (0.3677)
+03/19 01:07:22 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1614 (0.1899)
+03/19 01:07:22 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7103 (0.3039)
+03/19 01:07:22 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1250 (-9.9469)
+03/19 01:07:22 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4785 (8.4771)
+03/19 01:07:22 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:09:49 INFO train_distill_dimo.py:734] Iteration 5600, lr_s=4.98e-06 lr_a=4.98e-06, time=2.73s
+03/19 01:09:49 INFO train_distill_dimo.py:745]     Train H_mean: 10.0625 (9.8112)
+03/19 01:09:49 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1900 (-0.1898)
+03/19 01:09:49 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2665 (0.3208)
+03/19 01:09:49 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1389 (0.1737)
+03/19 01:09:49 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5959 (0.1673)
+03/19 01:09:49 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0938 (-9.8650)
+03/19 01:09:49 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4749 (8.4757)
+03/19 01:09:49 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:12:16 INFO train_distill_dimo.py:734] Iteration 5650, lr_s=4.91e-06 lr_a=4.91e-06, time=3.51s
+03/19 01:12:16 INFO train_distill_dimo.py:745]     Train H_mean: 10.0625 (10.1000)
+03/19 01:12:16 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1840 (-0.1845)
+03/19 01:12:16 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2906 (0.4084)
+03/19 01:12:16 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1489 (0.2192)
+03/19 01:12:16 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2721 (0.0404)
+03/19 01:12:16 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.3438 (-10.2075)
+03/19 01:12:16 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4765 (8.4772)
+03/19 01:12:16 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:14:42 INFO train_distill_dimo.py:734] Iteration 5700, lr_s=4.83e-06 lr_a=4.83e-06, time=3.13s
+03/19 01:14:42 INFO train_distill_dimo.py:745]     Train H_mean: 9.7812 (9.4920)
+03/19 01:14:42 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1779 (-0.1787)
+03/19 01:14:42 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2850 (0.3683)
+03/19 01:14:42 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1494 (0.2103)
+03/19 01:14:42 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4650 (0.0708)
+03/19 01:14:42 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.8125 (-9.6223)
+03/19 01:14:42 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4750 (8.4761)
+03/19 01:14:42 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:17:09 INFO train_distill_dimo.py:734] Iteration 5750, lr_s=4.76e-06 lr_a=4.76e-06, time=2.73s
+03/19 01:17:09 INFO train_distill_dimo.py:745]     Train H_mean: 10.1875 (10.2050)
+03/19 01:17:09 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1735 (-0.1737)
+03/19 01:17:09 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2796 (0.4347)
+03/19 01:17:09 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1661 (0.2220)
+03/19 01:17:09 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5280 (-0.0328)
+03/19 01:17:09 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2500 (-10.2150)
+03/19 01:17:09 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4792 (8.4778)
+03/19 01:17:09 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:19:37 INFO train_distill_dimo.py:734] Iteration 5800, lr_s=4.69e-06 lr_a=4.69e-06, time=2.70s
+03/19 01:19:37 INFO train_distill_dimo.py:745]     Train H_mean: 10.2188 (10.1325)
+03/19 01:19:37 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1750 (-0.1748)
+03/19 01:19:37 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2742 (0.2884)
+03/19 01:19:37 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1492 (0.1636)
+03/19 01:19:37 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5838 (-0.0799)
+03/19 01:19:37 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2812 (-10.1456)
+03/19 01:19:37 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4762 (8.4771)
+03/19 01:19:37 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:22:04 INFO train_distill_dimo.py:734] Iteration 5850, lr_s=4.61e-06 lr_a=4.61e-06, time=2.75s
+03/19 01:22:04 INFO train_distill_dimo.py:745]     Train H_mean: 10.2812 (9.8875)
+03/19 01:22:04 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1818 (-0.1817)
+03/19 01:22:04 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2242 (0.3108)
+03/19 01:22:04 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1343 (0.1703)
+03/19 01:22:04 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4878 (-0.2189)
+03/19 01:22:04 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2188 (-9.8987)
+03/19 01:22:04 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4788 (8.4771)
+03/19 01:22:04 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:24:32 INFO train_distill_dimo.py:734] Iteration 5900, lr_s=4.54e-06 lr_a=4.54e-06, time=3.14s
+03/19 01:24:32 INFO train_distill_dimo.py:745]     Train H_mean: 9.9062 (9.7762)
+03/19 01:24:32 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2155 (-0.2094)
+03/19 01:24:32 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2382 (0.3119)
+03/19 01:24:32 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1635 (0.2782)
+03/19 01:24:32 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8916 (-0.6504)
+03/19 01:24:32 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9375 (-9.7556)
+03/19 01:24:32 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4778 (8.4764)
+03/19 01:24:32 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:27:00 INFO train_distill_dimo.py:734] Iteration 5950, lr_s=4.47e-06 lr_a=4.47e-06, time=3.60s
+03/19 01:27:00 INFO train_distill_dimo.py:745]     Train H_mean: 9.8750 (9.6919)
+03/19 01:27:00 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2082 (-0.2076)
+03/19 01:27:00 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2113 (0.3185)
+03/19 01:27:00 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1349 (0.1785)
+03/19 01:27:00 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7632 (0.2363)
+03/19 01:27:00 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.8438 (-9.6456)
+03/19 01:27:00 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4755 (8.4752)
+03/19 01:27:00 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:29:26 INFO train_distill_dimo.py:734] Iteration 6000, lr_s=4.40e-06 lr_a=4.40e-06, time=2.71s
+03/19 01:29:26 INFO train_distill_dimo.py:745]     Train H_mean: 10.5938 (10.2137)
+03/19 01:29:26 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1987 (-0.1987)
+03/19 01:29:26 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2726 (0.4193)
+03/19 01:29:26 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1659 (0.2305)
+03/19 01:29:26 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7214 (-0.0423)
+03/19 01:29:26 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.2250)
+03/19 01:29:26 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4776 (8.4772)
+03/19 01:29:26 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:29:26 INFO train_distill_dimo.py:667] < PROGRESS: 60.01% | SPEED: 2.910s / step | ETA: 3:14:00 >
+03/19 01:29:45 INFO train_distill_dimo.py:720] [save] step=6000 → ./experiments/distill_dimo/checkpoints/checkpoint-6000
+03/19 01:32:12 INFO train_distill_dimo.py:734] Iteration 6050, lr_s=4.32e-06 lr_a=4.32e-06, time=2.72s
+03/19 01:32:12 INFO train_distill_dimo.py:745]     Train H_mean: 10.4688 (10.2025)
+03/19 01:32:12 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1915 (-0.1924)
+03/19 01:32:12 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3115 (0.4157)
+03/19 01:32:12 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1705 (0.2185)
+03/19 01:32:12 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6358 (0.1425)
+03/19 01:32:12 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4688 (-10.2188)
+03/19 01:32:12 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4774 (8.4778)
+03/19 01:32:12 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:34:39 INFO train_distill_dimo.py:734] Iteration 6100, lr_s=4.25e-06 lr_a=4.25e-06, time=2.75s
+03/19 01:34:39 INFO train_distill_dimo.py:745]     Train H_mean: 10.5938 (10.1562)
+03/19 01:34:39 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1771 (-0.1785)
+03/19 01:34:39 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2499 (0.3686)
+03/19 01:34:39 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1509 (0.2110)
+03/19 01:34:39 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7567 (0.2160)
+03/19 01:34:39 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5625 (-10.1625)
+03/19 01:34:39 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4765 (8.4764)
+03/19 01:34:39 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:37:06 INFO train_distill_dimo.py:734] Iteration 6150, lr_s=4.18e-06 lr_a=4.18e-06, time=2.71s
+03/19 01:37:06 INFO train_distill_dimo.py:745]     Train H_mean: 10.4688 (10.2400)
+03/19 01:37:06 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1757 (-0.1760)
+03/19 01:37:06 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2556 (0.3388)
+03/19 01:37:06 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1508 (0.1892)
+03/19 01:37:06 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4774 (-0.0983)
+03/19 01:37:06 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5312 (-10.2525)
+03/19 01:37:06 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4784 (8.4777)
+03/19 01:37:06 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:39:34 INFO train_distill_dimo.py:734] Iteration 6200, lr_s=4.11e-06 lr_a=4.11e-06, time=3.57s
+03/19 01:39:34 INFO train_distill_dimo.py:745]     Train H_mean: 9.9375 (10.1775)
+03/19 01:39:34 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1845 (-0.1839)
+03/19 01:39:34 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2359 (0.3874)
+03/19 01:39:34 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1548 (0.2220)
+03/19 01:39:34 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5494 (-0.3012)
+03/19 01:39:34 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9375 (-10.1638)
+03/19 01:39:34 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4780 (8.4765)
+03/19 01:39:34 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:42:00 INFO train_distill_dimo.py:734] Iteration 6250, lr_s=4.04e-06 lr_a=4.04e-06, time=3.51s
+03/19 01:42:00 INFO train_distill_dimo.py:745]     Train H_mean: 10.3750 (10.1175)
+03/19 01:42:00 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2001 (-0.1994)
+03/19 01:42:00 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2952 (0.4162)
+03/19 01:42:00 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1504 (0.2298)
+03/19 01:42:00 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5741 (-0.2509)
+03/19 01:42:00 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.3750 (-10.0800)
+03/19 01:42:00 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4769 (8.4767)
+03/19 01:42:00 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:44:26 INFO train_distill_dimo.py:734] Iteration 6300, lr_s=3.97e-06 lr_a=3.97e-06, time=2.72s
+03/19 01:44:26 INFO train_distill_dimo.py:745]     Train H_mean: 10.6250 (10.2381)
+03/19 01:44:26 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1876 (-0.1860)
+03/19 01:44:26 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2596 (0.3065)
+03/19 01:44:26 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1490 (0.1804)
+03/19 01:44:26 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8354 (0.4166)
+03/19 01:44:26 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.2425)
+03/19 01:44:26 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4774 (8.4775)
+03/19 01:44:26 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:46:54 INFO train_distill_dimo.py:734] Iteration 6350, lr_s=3.90e-06 lr_a=3.90e-06, time=2.74s
+03/19 01:46:54 INFO train_distill_dimo.py:745]     Train H_mean: 10.7188 (10.2737)
+03/19 01:46:54 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1777 (-0.1781)
+03/19 01:46:54 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2901 (0.3367)
+03/19 01:46:54 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1792 (0.1998)
+03/19 01:46:54 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4335 (-0.0818)
+03/19 01:46:54 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7500 (-10.2788)
+03/19 01:46:54 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4791 (8.4774)
+03/19 01:46:54 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:49:22 INFO train_distill_dimo.py:734] Iteration 6400, lr_s=3.83e-06 lr_a=3.83e-06, time=2.74s
+03/19 01:49:22 INFO train_distill_dimo.py:745]     Train H_mean: 10.4062 (10.1862)
+03/19 01:49:22 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1692 (-0.1706)
+03/19 01:49:22 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2449 (0.3181)
+03/19 01:49:22 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1422 (0.1788)
+03/19 01:49:22 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5358 (0.3002)
+03/19 01:49:22 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4062 (-10.1988)
+03/19 01:49:22 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4786 (8.4775)
+03/19 01:49:22 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:51:49 INFO train_distill_dimo.py:734] Iteration 6450, lr_s=3.76e-06 lr_a=3.76e-06, time=3.50s
+03/19 01:51:49 INFO train_distill_dimo.py:745]     Train H_mean: 10.1875 (9.9702)
+03/19 01:51:49 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1650 (-0.1657)
+03/19 01:51:49 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2516 (0.4120)
+03/19 01:51:49 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1474 (0.2296)
+03/19 01:51:49 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5119 (-0.2050)
+03/19 01:51:49 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0938 (-9.9619)
+03/19 01:51:49 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4761 (8.4770)
+03/19 01:51:49 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:54:15 INFO train_distill_dimo.py:734] Iteration 6500, lr_s=3.69e-06 lr_a=3.69e-06, time=3.18s
+03/19 01:54:15 INFO train_distill_dimo.py:745]     Train H_mean: 10.6250 (10.1387)
+03/19 01:54:15 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1703 (-0.1696)
+03/19 01:54:15 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2399 (0.3501)
+03/19 01:54:15 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1411 (0.2045)
+03/19 01:54:15 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6155 (0.1213)
+03/19 01:54:15 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.6875 (-10.1575)
+03/19 01:54:15 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4768 (8.4768)
+03/19 01:54:15 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:54:15 INFO train_distill_dimo.py:667] < PROGRESS: 65.01% | SPEED: 2.912s / step | ETA: 2:49:53 >
+03/19 01:56:41 INFO train_distill_dimo.py:734] Iteration 6550, lr_s=3.63e-06 lr_a=3.63e-06, time=2.74s
+03/19 01:56:41 INFO train_distill_dimo.py:745]     Train H_mean: 10.3438 (10.2688)
+03/19 01:56:41 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1720 (-0.1722)
+03/19 01:56:41 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2843 (0.3606)
+03/19 01:56:41 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1632 (0.1943)
+03/19 01:56:41 INFO train_distill_dimo.py:745]     Train loss_pg: 0.0754 (-0.3143)
+03/19 01:56:41 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2812 (-10.2700)
+03/19 01:56:41 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4775 (8.4785)
+03/19 01:56:41 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 01:59:09 INFO train_distill_dimo.py:734] Iteration 6600, lr_s=3.56e-06 lr_a=3.56e-06, time=2.73s
+03/19 01:59:09 INFO train_distill_dimo.py:745]     Train H_mean: 10.7812 (10.3562)
+03/19 01:59:09 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1793 (-0.1793)
+03/19 01:59:09 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2352 (0.3641)
+03/19 01:59:09 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1507 (0.2207)
+03/19 01:59:09 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5184 (-0.0410)
+03/19 01:59:09 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7812 (-10.3613)
+03/19 01:59:09 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4790 (8.4785)
+03/19 01:59:09 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:01:36 INFO train_distill_dimo.py:734] Iteration 6650, lr_s=3.49e-06 lr_a=3.49e-06, time=2.69s
+03/19 02:01:36 INFO train_distill_dimo.py:745]     Train H_mean: 10.1875 (10.0819)
+03/19 02:01:36 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1713 (-0.1715)
+03/19 02:01:36 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2440 (0.3292)
+03/19 02:01:36 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1429 (0.1895)
+03/19 02:01:36 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4424 (0.1293)
+03/19 02:01:36 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1875 (-10.0600)
+03/19 02:01:36 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4760 (8.4767)
+03/19 02:01:36 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:04:02 INFO train_distill_dimo.py:734] Iteration 6700, lr_s=3.43e-06 lr_a=3.43e-06, time=2.75s
+03/19 02:04:02 INFO train_distill_dimo.py:745]     Train H_mean: 10.5312 (10.2312)
+03/19 02:04:02 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1666 (-0.1670)
+03/19 02:04:02 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2346 (0.3539)
+03/19 02:04:02 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1346 (0.1886)
+03/19 02:04:02 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4205 (0.1019)
+03/19 02:04:02 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5000 (-10.2375)
+03/19 02:04:02 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4777 (8.4776)
+03/19 02:04:02 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:06:29 INFO train_distill_dimo.py:734] Iteration 6750, lr_s=3.36e-06 lr_a=3.36e-06, time=3.18s
+03/19 02:06:29 INFO train_distill_dimo.py:745]     Train H_mean: 10.6250 (10.2175)
+03/19 02:06:29 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1673 (-0.1688)
+03/19 02:06:29 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2403 (0.3262)
+03/19 02:06:29 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1337 (0.1757)
+03/19 02:06:29 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4131 (-0.3111)
+03/19 02:06:29 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.6562 (-10.2263)
+03/19 02:06:29 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4763 (8.4765)
+03/19 02:06:29 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:08:55 INFO train_distill_dimo.py:734] Iteration 6800, lr_s=3.29e-06 lr_a=3.29e-06, time=3.50s
+03/19 02:08:55 INFO train_distill_dimo.py:745]     Train H_mean: 10.5938 (10.2888)
+03/19 02:08:55 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1826 (-0.1820)
+03/19 02:08:55 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3184 (0.4081)
+03/19 02:08:55 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1816 (0.2110)
+03/19 02:08:55 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4983 (-0.1676)
+03/19 02:08:55 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.6562 (-10.2950)
+03/19 02:08:55 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4771 (8.4776)
+03/19 02:08:55 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:11:20 INFO train_distill_dimo.py:734] Iteration 6850, lr_s=3.23e-06 lr_a=3.23e-06, time=2.75s
+03/19 02:11:20 INFO train_distill_dimo.py:745]     Train H_mean: 10.1250 (10.0053)
+03/19 02:11:20 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2113 (-0.2090)
+03/19 02:11:20 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3061 (0.4119)
+03/19 02:11:20 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1693 (0.2922)
+03/19 02:11:20 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5662 (-0.6005)
+03/19 02:11:20 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0938 (-10.0181)
+03/19 02:11:20 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4782 (8.4772)
+03/19 02:11:20 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:13:48 INFO train_distill_dimo.py:734] Iteration 6900, lr_s=3.17e-06 lr_a=3.17e-06, time=2.81s
+03/19 02:13:48 INFO train_distill_dimo.py:745]     Train H_mean: 10.1562 (10.1125)
+03/19 02:13:48 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1980 (-0.1991)
+03/19 02:13:48 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2468 (0.3468)
+03/19 02:13:48 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1371 (0.1826)
+03/19 02:13:48 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8509 (0.3520)
+03/19 02:13:48 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1562 (-10.1225)
+03/19 02:13:48 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4781 (8.4769)
+03/19 02:13:48 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:16:15 INFO train_distill_dimo.py:734] Iteration 6950, lr_s=3.10e-06 lr_a=3.10e-06, time=2.72s
+03/19 02:16:15 INFO train_distill_dimo.py:745]     Train H_mean: 10.6562 (10.0563)
+03/19 02:16:15 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1862 (-0.1861)
+03/19 02:16:15 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.3252 (0.3967)
+03/19 02:16:15 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1662 (0.2154)
+03/19 02:16:15 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4584 (0.0483)
+03/19 02:16:15 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.6562 (-10.0581)
+03/19 02:16:15 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4771 (8.4775)
+03/19 02:16:15 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:18:41 INFO train_distill_dimo.py:734] Iteration 7000, lr_s=3.04e-06 lr_a=3.04e-06, time=3.11s
+03/19 02:18:41 INFO train_distill_dimo.py:745]     Train H_mean: 9.8125 (10.0544)
+03/19 02:18:41 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1841 (-0.1845)
+03/19 02:18:41 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2884 (0.4013)
+03/19 02:18:41 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1451 (0.2079)
+03/19 02:18:41 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5175 (-0.0960)
+03/19 02:18:41 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.8125 (-10.0719)
+03/19 02:18:41 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4787 (8.4777)
+03/19 02:18:41 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:18:41 INFO train_distill_dimo.py:667] < PROGRESS: 70.01% | SPEED: 2.914s / step | ETA: 2:25:40 >
+03/19 02:18:59 INFO train_distill_dimo.py:720] [save] step=7000 → ./experiments/distill_dimo/checkpoints/checkpoint-7000
+03/19 02:21:24 INFO train_distill_dimo.py:734] Iteration 7050, lr_s=2.98e-06 lr_a=2.98e-06, time=2.71s
+03/19 02:21:24 INFO train_distill_dimo.py:745]     Train H_mean: 9.8438 (9.9242)
+03/19 02:21:24 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1762 (-0.1771)
+03/19 02:21:24 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2395 (0.4564)
+03/19 02:21:24 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1399 (0.2126)
+03/19 02:21:24 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6351 (0.2189)
+03/19 02:21:24 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.8750 (-9.9273)
+03/19 02:21:24 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4768 (8.4767)
+03/19 02:21:24 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:23:52 INFO train_distill_dimo.py:734] Iteration 7100, lr_s=2.92e-06 lr_a=2.92e-06, time=2.74s
+03/19 02:23:52 INFO train_distill_dimo.py:745]     Train H_mean: 10.7500 (10.3138)
+03/19 02:23:52 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1761 (-0.1757)
+03/19 02:23:52 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2542 (0.3083)
+03/19 02:23:52 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1415 (0.1714)
+03/19 02:23:52 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4394 (-0.0397)
+03/19 02:23:52 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7500 (-10.3275)
+03/19 02:23:52 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4786 (8.4790)
+03/19 02:23:52 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:26:21 INFO train_distill_dimo.py:734] Iteration 7150, lr_s=2.86e-06 lr_a=2.86e-06, time=2.71s
+03/19 02:26:21 INFO train_distill_dimo.py:745]     Train H_mean: 10.5312 (10.2800)
+03/19 02:26:21 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1712 (-0.1706)
+03/19 02:26:21 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2842 (0.3456)
+03/19 02:26:21 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1642 (0.1885)
+03/19 02:26:21 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3110 (-0.0203)
+03/19 02:26:21 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5312 (-10.2900)
+03/19 02:26:21 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4770 (8.4768)
+03/19 02:26:21 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:28:49 INFO train_distill_dimo.py:734] Iteration 7200, lr_s=2.80e-06 lr_a=2.80e-06, time=2.74s
+03/19 02:28:49 INFO train_distill_dimo.py:745]     Train H_mean: 10.6250 (10.2525)
+03/19 02:28:49 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1732 (-0.1742)
+03/19 02:28:49 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2253 (0.3568)
+03/19 02:28:49 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1300 (0.1852)
+03/19 02:28:49 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7309 (-0.0390)
+03/19 02:28:49 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.2625)
+03/19 02:28:49 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4777 (8.4770)
+03/19 02:28:49 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:31:17 INFO train_distill_dimo.py:734] Iteration 7250, lr_s=2.74e-06 lr_a=2.74e-06, time=2.71s
+03/19 02:31:17 INFO train_distill_dimo.py:745]     Train H_mean: 10.5938 (10.2600)
+03/19 02:31:17 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1655 (-0.1656)
+03/19 02:31:17 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2497 (0.3025)
+03/19 02:31:17 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1463 (0.1647)
+03/19 02:31:17 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4490 (0.0991)
+03/19 02:31:17 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4688 (-10.2613)
+03/19 02:31:17 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4772 (8.4783)
+03/19 02:31:17 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:33:45 INFO train_distill_dimo.py:734] Iteration 7300, lr_s=2.68e-06 lr_a=2.68e-06, time=3.52s
+03/19 02:33:45 INFO train_distill_dimo.py:745]     Train H_mean: 10.8125 (10.2700)
+03/19 02:33:45 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1670 (-0.1669)
+03/19 02:33:45 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2348 (0.2747)
+03/19 02:33:45 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1274 (0.1627)
+03/19 02:33:45 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3767 (-0.1435)
+03/19 02:33:45 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.8438 (-10.2762)
+03/19 02:33:45 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4785 (8.4763)
+03/19 02:33:45 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:36:12 INFO train_distill_dimo.py:734] Iteration 7350, lr_s=2.62e-06 lr_a=2.62e-06, time=3.14s
+03/19 02:36:12 INFO train_distill_dimo.py:745]     Train H_mean: 10.6250 (10.2312)
+03/19 02:36:12 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1703 (-0.1728)
+03/19 02:36:12 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2291 (0.3137)
+03/19 02:36:12 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1327 (0.1742)
+03/19 02:36:12 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2598 (-0.3126)
+03/19 02:36:12 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.6250 (-10.2350)
+03/19 02:36:12 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4765 (8.4772)
+03/19 02:36:12 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:38:39 INFO train_distill_dimo.py:734] Iteration 7400, lr_s=2.56e-06 lr_a=2.56e-06, time=2.70s
+03/19 02:38:39 INFO train_distill_dimo.py:745]     Train H_mean: 10.5312 (10.2263)
+03/19 02:38:39 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1764 (-0.1772)
+03/19 02:38:39 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2629 (0.3446)
+03/19 02:38:39 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1467 (0.2056)
+03/19 02:38:39 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5025 (0.1134)
+03/19 02:38:39 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.6250 (-10.2438)
+03/19 02:38:39 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4793 (8.4779)
+03/19 02:38:39 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:41:08 INFO train_distill_dimo.py:734] Iteration 7450, lr_s=2.51e-06 lr_a=2.51e-06, time=2.71s
+03/19 02:41:08 INFO train_distill_dimo.py:745]     Train H_mean: 10.6875 (10.1450)
+03/19 02:41:08 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1723 (-0.1723)
+03/19 02:41:08 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2795 (0.3529)
+03/19 02:41:08 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1549 (0.1907)
+03/19 02:41:08 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3105 (-0.2251)
+03/19 02:41:08 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7188 (-10.1487)
+03/19 02:41:08 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4768 (8.4778)
+03/19 02:41:08 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:43:37 INFO train_distill_dimo.py:734] Iteration 7500, lr_s=2.45e-06 lr_a=2.45e-06, time=2.73s
+03/19 02:43:37 INFO train_distill_dimo.py:745]     Train H_mean: 10.0625 (9.8313)
+03/19 02:43:37 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1755 (-0.1751)
+03/19 02:43:37 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2448 (0.2812)
+03/19 02:43:37 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1527 (0.1583)
+03/19 02:43:37 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4603 (0.1545)
+03/19 02:43:37 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0938 (-9.8444)
+03/19 02:43:37 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4766 (8.4776)
+03/19 02:43:37 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:43:37 INFO train_distill_dimo.py:667] < PROGRESS: 75.01% | SPEED: 2.917s / step | ETA: 2:01:31 >
+03/19 02:46:04 INFO train_distill_dimo.py:734] Iteration 7550, lr_s=2.40e-06 lr_a=2.40e-06, time=3.28s
+03/19 02:46:04 INFO train_distill_dimo.py:745]     Train H_mean: 10.0625 (10.2019)
+03/19 02:46:04 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1661 (-0.1668)
+03/19 02:46:04 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2437 (0.3183)
+03/19 02:46:04 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1277 (0.1773)
+03/19 02:46:04 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5998 (-0.1095)
+03/19 02:46:04 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1250 (-10.1994)
+03/19 02:46:04 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4794 (8.4786)
+03/19 02:46:04 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:48:33 INFO train_distill_dimo.py:734] Iteration 7600, lr_s=2.35e-06 lr_a=2.35e-06, time=3.16s
+03/19 02:48:33 INFO train_distill_dimo.py:745]     Train H_mean: 10.5000 (10.2637)
+03/19 02:48:33 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1757 (-0.1767)
+03/19 02:48:33 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2153 (0.2749)
+03/19 02:48:33 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1238 (0.1517)
+03/19 02:48:33 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7250 (0.1025)
+03/19 02:48:33 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5000 (-10.2762)
+03/19 02:48:33 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4774 (8.4768)
+03/19 02:48:33 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:51:01 INFO train_distill_dimo.py:734] Iteration 7650, lr_s=2.29e-06 lr_a=2.29e-06, time=3.14s
+03/19 02:51:01 INFO train_distill_dimo.py:745]     Train H_mean: 10.3438 (10.1713)
+03/19 02:51:01 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1620 (-0.1630)
+03/19 02:51:01 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2364 (0.3393)
+03/19 02:51:01 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1347 (0.1874)
+03/19 02:51:01 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4474 (0.0236)
+03/19 02:51:01 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.3438 (-10.1788)
+03/19 02:51:01 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4761 (8.4770)
+03/19 02:51:01 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:53:30 INFO train_distill_dimo.py:734] Iteration 7700, lr_s=2.24e-06 lr_a=2.24e-06, time=2.75s
+03/19 02:53:30 INFO train_distill_dimo.py:745]     Train H_mean: 10.5938 (10.2163)
+03/19 02:53:30 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1654 (-0.1663)
+03/19 02:53:30 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2335 (0.3191)
+03/19 02:53:30 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1440 (0.1937)
+03/19 02:53:30 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4220 (-0.1591)
+03/19 02:53:30 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.2275)
+03/19 02:53:30 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4779 (8.4774)
+03/19 02:53:30 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:55:58 INFO train_distill_dimo.py:734] Iteration 7750, lr_s=2.19e-06 lr_a=2.19e-06, time=2.73s
+03/19 02:55:58 INFO train_distill_dimo.py:745]     Train H_mean: 10.6562 (10.1900)
+03/19 02:55:58 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1686 (-0.1693)
+03/19 02:55:58 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2475 (0.3011)
+03/19 02:55:58 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1429 (0.1661)
+03/19 02:55:58 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6219 (0.0598)
+03/19 02:55:58 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.6562 (-10.1937)
+03/19 02:55:58 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4792 (8.4782)
+03/19 02:55:58 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 02:58:27 INFO train_distill_dimo.py:734] Iteration 7800, lr_s=2.14e-06 lr_a=2.14e-06, time=2.80s
+03/19 02:58:27 INFO train_distill_dimo.py:745]     Train H_mean: 10.2500 (10.0084)
+03/19 02:58:27 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1713 (-0.1710)
+03/19 02:58:27 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2684 (0.3190)
+03/19 02:58:27 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1436 (0.1814)
+03/19 02:58:27 INFO train_distill_dimo.py:745]     Train loss_pg: 0.1543 (-0.1339)
+03/19 02:58:27 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2188 (-10.0028)
+03/19 02:58:27 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4770 (8.4776)
+03/19 02:58:27 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:00:55 INFO train_distill_dimo.py:734] Iteration 7850, lr_s=2.09e-06 lr_a=2.09e-06, time=3.93s
+03/19 03:00:55 INFO train_distill_dimo.py:745]     Train H_mean: 10.1250 (9.9356)
+03/19 03:00:55 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1697 (-0.1710)
+03/19 03:00:55 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2136 (0.3007)
+03/19 03:00:55 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1339 (0.1741)
+03/19 03:00:55 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4387 (-0.2156)
+03/19 03:00:55 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1250 (-9.9467)
+03/19 03:00:55 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4770 (8.4771)
+03/19 03:00:55 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:03:22 INFO train_distill_dimo.py:734] Iteration 7900, lr_s=2.04e-06 lr_a=2.04e-06, time=3.15s
+03/19 03:03:22 INFO train_distill_dimo.py:745]     Train H_mean: 10.1562 (10.1419)
+03/19 03:03:22 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1734 (-0.1720)
+03/19 03:03:22 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2539 (0.3537)
+03/19 03:03:22 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1461 (0.1987)
+03/19 03:03:22 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6415 (0.3313)
+03/19 03:03:22 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1875 (-10.1475)
+03/19 03:03:22 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4760 (8.4780)
+03/19 03:03:22 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:05:50 INFO train_distill_dimo.py:734] Iteration 7950, lr_s=2.00e-06 lr_a=2.00e-06, time=2.71s
+03/19 03:05:50 INFO train_distill_dimo.py:745]     Train H_mean: 10.5625 (10.2375)
+03/19 03:05:50 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1567 (-0.1566)
+03/19 03:05:50 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2596 (0.3554)
+03/19 03:05:50 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1485 (0.1904)
+03/19 03:05:50 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4677 (0.0916)
+03/19 03:05:50 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.2438)
+03/19 03:05:50 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4804 (8.4780)
+03/19 03:05:50 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:08:19 INFO train_distill_dimo.py:734] Iteration 8000, lr_s=1.95e-06 lr_a=1.95e-06, time=2.73s
+03/19 03:08:19 INFO train_distill_dimo.py:745]     Train H_mean: 10.5000 (10.1962)
+03/19 03:08:19 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1548 (-0.1557)
+03/19 03:08:19 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2431 (0.2850)
+03/19 03:08:19 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1226 (0.1543)
+03/19 03:08:19 INFO train_distill_dimo.py:745]     Train loss_pg: 0.0326 (-0.2131)
+03/19 03:08:19 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5312 (-10.2050)
+03/19 03:08:19 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4776 (8.4768)
+03/19 03:08:19 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:08:19 INFO train_distill_dimo.py:667] < PROGRESS: 80.01% | SPEED: 2.919s / step | ETA: 1:37:18 >
+03/19 03:08:38 INFO train_distill_dimo.py:720] [save] step=8000 → ./experiments/distill_dimo/checkpoints/checkpoint-8000
+03/19 03:11:04 INFO train_distill_dimo.py:734] Iteration 8050, lr_s=1.90e-06 lr_a=1.90e-06, time=2.69s
+03/19 03:11:04 INFO train_distill_dimo.py:745]     Train H_mean: 10.6250 (10.1738)
+03/19 03:11:04 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1593 (-0.1595)
+03/19 03:11:04 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2148 (0.2721)
+03/19 03:11:04 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1288 (0.1563)
+03/19 03:11:04 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5467 (-0.1186)
+03/19 03:11:04 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.6562 (-10.1862)
+03/19 03:11:04 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4783 (8.4772)
+03/19 03:11:04 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:13:31 INFO train_distill_dimo.py:734] Iteration 8100, lr_s=1.86e-06 lr_a=1.86e-06, time=3.09s
+03/19 03:13:31 INFO train_distill_dimo.py:745]     Train H_mean: 10.5000 (10.2613)
+03/19 03:13:31 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1645 (-0.1660)
+03/19 03:13:31 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2390 (0.3833)
+03/19 03:13:31 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1361 (0.1893)
+03/19 03:13:31 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5501 (-0.1406)
+03/19 03:13:31 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5000 (-10.2662)
+03/19 03:13:31 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4793 (8.4792)
+03/19 03:13:31 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:15:59 INFO train_distill_dimo.py:734] Iteration 8150, lr_s=1.82e-06 lr_a=1.82e-06, time=3.20s
+03/19 03:15:59 INFO train_distill_dimo.py:745]     Train H_mean: 10.6250 (10.2863)
+03/19 03:15:59 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1655 (-0.1657)
+03/19 03:15:59 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2241 (0.4463)
+03/19 03:15:59 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1485 (0.2235)
+03/19 03:15:59 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4240 (-0.1302)
+03/19 03:15:59 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.6250 (-10.2950)
+03/19 03:15:59 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4806 (8.4779)
+03/19 03:15:59 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:18:24 INFO train_distill_dimo.py:734] Iteration 8200, lr_s=1.77e-06 lr_a=1.77e-06, time=3.16s
+03/19 03:18:24 INFO train_distill_dimo.py:745]     Train H_mean: 10.7188 (10.3125)
+03/19 03:18:24 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1956 (-0.1912)
+03/19 03:18:24 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2578 (0.3252)
+03/19 03:18:24 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1442 (0.2148)
+03/19 03:18:24 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6618 (-0.4621)
+03/19 03:18:24 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7188 (-10.3075)
+03/19 03:18:24 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4813 (8.4799)
+03/19 03:18:24 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:20:51 INFO train_distill_dimo.py:734] Iteration 8250, lr_s=1.73e-06 lr_a=1.73e-06, time=2.74s
+03/19 03:20:51 INFO train_distill_dimo.py:745]     Train H_mean: 10.5312 (10.2325)
+03/19 03:20:51 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2004 (-0.2012)
+03/19 03:20:51 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2269 (0.2719)
+03/19 03:20:51 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1441 (0.1655)
+03/19 03:20:51 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6088 (0.0244)
+03/19 03:20:51 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5312 (-10.2500)
+03/19 03:20:51 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4789 (8.4785)
+03/19 03:20:51 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:23:18 INFO train_distill_dimo.py:734] Iteration 8300, lr_s=1.69e-06 lr_a=1.69e-06, time=2.73s
+03/19 03:23:18 INFO train_distill_dimo.py:745]     Train H_mean: 9.9688 (9.9478)
+03/19 03:23:18 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1961 (-0.1980)
+03/19 03:23:18 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2211 (0.3096)
+03/19 03:23:18 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1258 (0.2068)
+03/19 03:23:18 INFO train_distill_dimo.py:745]     Train loss_pg: 0.9351 (0.0746)
+03/19 03:23:18 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0312 (-9.9541)
+03/19 03:23:18 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4745 (8.4771)
+03/19 03:23:18 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:25:45 INFO train_distill_dimo.py:734] Iteration 8350, lr_s=1.65e-06 lr_a=1.65e-06, time=2.76s
+03/19 03:25:45 INFO train_distill_dimo.py:745]     Train H_mean: 10.7188 (10.2850)
+03/19 03:25:45 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1885 (-0.1886)
+03/19 03:25:45 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2167 (0.3178)
+03/19 03:25:45 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1270 (0.1835)
+03/19 03:25:45 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6028 (0.1538)
+03/19 03:25:45 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7188 (-10.2837)
+03/19 03:25:45 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4797 (8.4779)
+03/19 03:25:45 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:28:12 INFO train_distill_dimo.py:734] Iteration 8400, lr_s=1.62e-06 lr_a=1.62e-06, time=3.62s
+03/19 03:28:12 INFO train_distill_dimo.py:745]     Train H_mean: 10.0938 (10.1625)
+03/19 03:28:12 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1789 (-0.1797)
+03/19 03:28:12 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2319 (0.3285)
+03/19 03:28:12 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1297 (0.1818)
+03/19 03:28:12 INFO train_distill_dimo.py:745]     Train loss_pg: 0.1971 (-0.0505)
+03/19 03:28:12 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.1250 (-10.1650)
+03/19 03:28:12 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4797 (8.4788)
+03/19 03:28:12 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:30:39 INFO train_distill_dimo.py:734] Iteration 8450, lr_s=1.58e-06 lr_a=1.58e-06, time=3.18s
+03/19 03:30:39 INFO train_distill_dimo.py:745]     Train H_mean: 10.3750 (10.2150)
+03/19 03:30:39 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1716 (-0.1721)
+03/19 03:30:39 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2332 (0.3410)
+03/19 03:30:39 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1459 (0.1994)
+03/19 03:30:39 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5044 (0.1783)
+03/19 03:30:39 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2812 (-10.2188)
+03/19 03:30:39 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4780 (8.4786)
+03/19 03:30:39 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:33:05 INFO train_distill_dimo.py:734] Iteration 8500, lr_s=1.54e-06 lr_a=1.54e-06, time=2.76s
+03/19 03:33:05 INFO train_distill_dimo.py:745]     Train H_mean: 10.7812 (10.2788)
+03/19 03:33:05 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1649 (-0.1649)
+03/19 03:33:05 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2402 (0.3223)
+03/19 03:33:05 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1399 (0.1741)
+03/19 03:33:05 INFO train_distill_dimo.py:745]     Train loss_pg: 0.3409 (0.0329)
+03/19 03:33:05 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7812 (-10.2869)
+03/19 03:33:05 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4786 (8.4769)
+03/19 03:33:05 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:33:05 INFO train_distill_dimo.py:667] < PROGRESS: 85.01% | SPEED: 2.920s / step | ETA: 1:13:00 >
+03/19 03:35:31 INFO train_distill_dimo.py:734] Iteration 8550, lr_s=1.51e-06 lr_a=1.51e-06, time=2.79s
+03/19 03:35:31 INFO train_distill_dimo.py:745]     Train H_mean: 10.5625 (10.2487)
+03/19 03:35:31 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1714 (-0.1770)
+03/19 03:35:31 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2535 (0.3397)
+03/19 03:35:31 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1512 (0.2254)
+03/19 03:35:31 INFO train_distill_dimo.py:745]     Train loss_pg: 0.2132 (-0.5309)
+03/19 03:35:31 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.2613)
+03/19 03:35:31 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4756 (8.4774)
+03/19 03:35:31 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:37:59 INFO train_distill_dimo.py:734] Iteration 8600, lr_s=1.47e-06 lr_a=1.47e-06, time=2.75s
+03/19 03:37:59 INFO train_distill_dimo.py:745]     Train H_mean: 10.6562 (10.2075)
+03/19 03:37:59 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1882 (-0.1882)
+03/19 03:37:59 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2183 (0.2683)
+03/19 03:37:59 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1179 (0.1446)
+03/19 03:37:59 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7358 (0.2574)
+03/19 03:37:59 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.6562 (-10.2163)
+03/19 03:37:59 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4806 (8.4776)
+03/19 03:37:59 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:40:26 INFO train_distill_dimo.py:734] Iteration 8650, lr_s=1.44e-06 lr_a=1.44e-06, time=2.75s
+03/19 03:40:26 INFO train_distill_dimo.py:745]     Train H_mean: 10.5938 (10.2175)
+03/19 03:40:26 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1873 (-0.1860)
+03/19 03:40:26 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2706 (0.3404)
+03/19 03:40:26 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1441 (0.1832)
+03/19 03:40:26 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7116 (-0.1384)
+03/19 03:40:26 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5625 (-10.2263)
+03/19 03:40:26 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4782 (8.4773)
+03/19 03:40:26 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:42:51 INFO train_distill_dimo.py:734] Iteration 8700, lr_s=1.41e-06 lr_a=1.41e-06, time=2.71s
+03/19 03:42:52 INFO train_distill_dimo.py:745]     Train H_mean: 10.3125 (10.1650)
+03/19 03:42:52 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1832 (-0.1836)
+03/19 03:42:52 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2346 (0.2892)
+03/19 03:42:52 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1345 (0.1614)
+03/19 03:42:52 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5971 (0.0535)
+03/19 03:42:52 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2812 (-10.1675)
+03/19 03:42:52 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4789 (8.4781)
+03/19 03:42:52 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:45:19 INFO train_distill_dimo.py:734] Iteration 8750, lr_s=1.38e-06 lr_a=1.38e-06, time=3.13s
+03/19 03:45:19 INFO train_distill_dimo.py:745]     Train H_mean: 10.7500 (10.1081)
+03/19 03:45:19 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1826 (-0.1829)
+03/19 03:45:19 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2399 (0.3363)
+03/19 03:45:19 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1397 (0.1765)
+03/19 03:45:19 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5514 (-0.1078)
+03/19 03:45:19 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7188 (-10.1116)
+03/19 03:45:19 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4799 (8.4786)
+03/19 03:45:19 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:47:47 INFO train_distill_dimo.py:734] Iteration 8800, lr_s=1.35e-06 lr_a=1.35e-06, time=2.70s
+03/19 03:47:47 INFO train_distill_dimo.py:745]     Train H_mean: 10.5938 (10.2075)
+03/19 03:47:47 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1828 (-0.1814)
+03/19 03:47:47 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2262 (0.2807)
+03/19 03:47:47 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1358 (0.1578)
+03/19 03:47:47 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5703 (-0.0844)
+03/19 03:47:47 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.2075)
+03/19 03:47:47 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4778 (8.4782)
+03/19 03:47:47 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:50:15 INFO train_distill_dimo.py:734] Iteration 8850, lr_s=1.32e-06 lr_a=1.32e-06, time=2.75s
+03/19 03:50:15 INFO train_distill_dimo.py:745]     Train H_mean: 10.7812 (10.2350)
+03/19 03:50:15 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1823 (-0.1824)
+03/19 03:50:15 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2067 (0.2724)
+03/19 03:50:15 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1423 (0.1612)
+03/19 03:50:15 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7210 (0.0497)
+03/19 03:50:15 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7812 (-10.2500)
+03/19 03:50:15 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4794 (8.4784)
+03/19 03:50:15 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:52:43 INFO train_distill_dimo.py:734] Iteration 8900, lr_s=1.29e-06 lr_a=1.29e-06, time=2.71s
+03/19 03:52:43 INFO train_distill_dimo.py:745]     Train H_mean: 9.6562 (9.9453)
+03/19 03:52:43 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1731 (-0.1712)
+03/19 03:52:43 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1832 (0.2342)
+03/19 03:52:43 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1194 (0.1437)
+03/19 03:52:43 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6041 (0.3259)
+03/19 03:52:43 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.6562 (-9.9431)
+03/19 03:52:43 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4735 (8.4765)
+03/19 03:52:43 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:55:12 INFO train_distill_dimo.py:734] Iteration 8950, lr_s=1.27e-06 lr_a=1.27e-06, time=3.93s
+03/19 03:55:12 INFO train_distill_dimo.py:745]     Train H_mean: 10.5000 (10.2713)
+03/19 03:55:12 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1733 (-0.1739)
+03/19 03:55:12 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2075 (0.3371)
+03/19 03:55:12 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1295 (0.1857)
+03/19 03:55:12 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4022 (-0.4436)
+03/19 03:55:12 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5625 (-10.2831)
+03/19 03:55:12 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4764 (8.4777)
+03/19 03:55:12 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:57:39 INFO train_distill_dimo.py:734] Iteration 9000, lr_s=1.24e-06 lr_a=1.24e-06, time=3.20s
+03/19 03:57:39 INFO train_distill_dimo.py:745]     Train H_mean: 10.5938 (10.2375)
+03/19 03:57:39 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1795 (-0.1807)
+03/19 03:57:39 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1997 (0.2598)
+03/19 03:57:39 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1104 (0.1425)
+03/19 03:57:39 INFO train_distill_dimo.py:745]     Train loss_pg: 0.9349 (0.1889)
+03/19 03:57:39 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.2363)
+03/19 03:57:39 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4778 (8.4776)
+03/19 03:57:39 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 03:57:39 INFO train_distill_dimo.py:667] < PROGRESS: 90.01% | SPEED: 2.922s / step | ETA: 0:48:41 >
+03/19 03:57:57 INFO train_distill_dimo.py:720] [save] step=9000 → ./experiments/distill_dimo/checkpoints/checkpoint-9000
+03/19 04:00:25 INFO train_distill_dimo.py:734] Iteration 9050, lr_s=1.22e-06 lr_a=1.22e-06, time=2.73s
+03/19 04:00:25 INFO train_distill_dimo.py:745]     Train H_mean: 10.5625 (10.1084)
+03/19 04:00:25 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1716 (-0.1718)
+03/19 04:00:25 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2339 (0.3767)
+03/19 04:00:25 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1281 (0.1995)
+03/19 04:00:25 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4823 (-0.2695)
+03/19 04:00:25 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.1147)
+03/19 04:00:25 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4761 (8.4770)
+03/19 04:00:25 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:02:53 INFO train_distill_dimo.py:734] Iteration 9100, lr_s=1.20e-06 lr_a=1.20e-06, time=2.71s
+03/19 04:02:53 INFO train_distill_dimo.py:745]     Train H_mean: 10.6875 (10.2531)
+03/19 04:02:53 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1775 (-0.1808)
+03/19 04:02:53 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2079 (0.2957)
+03/19 04:02:53 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1306 (0.2099)
+03/19 04:02:53 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5595 (-0.2153)
+03/19 04:02:53 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7188 (-10.2656)
+03/19 04:02:53 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4775 (8.4786)
+03/19 04:02:53 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:05:22 INFO train_distill_dimo.py:734] Iteration 9150, lr_s=1.18e-06 lr_a=1.18e-06, time=2.75s
+03/19 04:05:22 INFO train_distill_dimo.py:745]     Train H_mean: 10.7500 (10.2250)
+03/19 04:05:22 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1859 (-0.1854)
+03/19 04:05:22 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2374 (0.2949)
+03/19 04:05:22 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1292 (0.1642)
+03/19 04:05:22 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4969 (-0.0228)
+03/19 04:05:22 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7188 (-10.2250)
+03/19 04:05:22 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4807 (8.4782)
+03/19 04:05:22 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:07:52 INFO train_distill_dimo.py:734] Iteration 9200, lr_s=1.16e-06 lr_a=1.16e-06, time=3.53s
+03/19 04:07:52 INFO train_distill_dimo.py:745]     Train H_mean: 10.2812 (10.2562)
+03/19 04:07:52 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1836 (-0.1838)
+03/19 04:07:52 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2002 (0.2476)
+03/19 04:07:52 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1236 (0.1361)
+03/19 04:07:52 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6280 (-0.0562)
+03/19 04:07:52 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2812 (-10.2625)
+03/19 04:07:52 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4801 (8.4800)
+03/19 04:07:52 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:10:20 INFO train_distill_dimo.py:734] Iteration 9250, lr_s=1.14e-06 lr_a=1.14e-06, time=3.18s
+03/19 04:10:20 INFO train_distill_dimo.py:745]     Train H_mean: 10.4375 (10.1988)
+03/19 04:10:20 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1700 (-0.1711)
+03/19 04:10:20 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2109 (0.2644)
+03/19 04:10:20 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1182 (0.1479)
+03/19 04:10:20 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8637 (0.5049)
+03/19 04:10:20 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4688 (-10.2100)
+03/19 04:10:20 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4773 (8.4768)
+03/19 04:10:20 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:12:48 INFO train_distill_dimo.py:734] Iteration 9300, lr_s=1.12e-06 lr_a=1.12e-06, time=3.62s
+03/19 04:12:48 INFO train_distill_dimo.py:745]     Train H_mean: 10.3750 (10.2688)
+03/19 04:12:48 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1748 (-0.1736)
+03/19 04:12:48 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2489 (0.3213)
+03/19 04:12:48 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1364 (0.1760)
+03/19 04:12:48 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6649 (-0.3828)
+03/19 04:12:48 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.3750 (-10.2750)
+03/19 04:12:48 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4786 (8.4786)
+03/19 04:12:48 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:15:16 INFO train_distill_dimo.py:734] Iteration 9350, lr_s=1.10e-06 lr_a=1.10e-06, time=2.74s
+03/19 04:15:16 INFO train_distill_dimo.py:745]     Train H_mean: 9.9375 (10.0512)
+03/19 04:15:16 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1762 (-0.1766)
+03/19 04:15:16 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1914 (0.2347)
+03/19 04:15:16 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1150 (0.1324)
+03/19 04:15:16 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6774 (0.1006)
+03/19 04:15:16 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -9.9062 (-10.0544)
+03/19 04:15:16 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4792 (8.4774)
+03/19 04:15:16 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:17:44 INFO train_distill_dimo.py:734] Iteration 9400, lr_s=1.09e-06 lr_a=1.09e-06, time=2.74s
+03/19 04:17:44 INFO train_distill_dimo.py:745]     Train H_mean: 10.0312 (10.2075)
+03/19 04:17:44 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1626 (-0.1636)
+03/19 04:17:44 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2338 (0.2966)
+03/19 04:17:44 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1319 (0.1795)
+03/19 04:17:44 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4617 (-0.0099)
+03/19 04:17:44 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0000 (-10.2188)
+03/19 04:17:44 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4797 (8.4783)
+03/19 04:17:44 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:20:13 INFO train_distill_dimo.py:734] Iteration 9450, lr_s=1.07e-06 lr_a=1.07e-06, time=2.74s
+03/19 04:20:13 INFO train_distill_dimo.py:745]     Train H_mean: 10.5625 (10.2325)
+03/19 04:20:13 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1652 (-0.1655)
+03/19 04:20:13 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2095 (0.2631)
+03/19 04:20:13 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1183 (0.1493)
+03/19 04:20:13 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6737 (0.0478)
+03/19 04:20:13 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.2425)
+03/19 04:20:13 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4790 (8.4790)
+03/19 04:20:13 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:22:42 INFO train_distill_dimo.py:734] Iteration 9500, lr_s=1.06e-06 lr_a=1.06e-06, time=3.92s
+03/19 04:22:42 INFO train_distill_dimo.py:745]     Train H_mean: 10.0000 (10.1425)
+03/19 04:22:42 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1577 (-0.1593)
+03/19 04:22:42 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2305 (0.2583)
+03/19 04:22:42 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1267 (0.1420)
+03/19 04:22:42 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6449 (0.0572)
+03/19 04:22:42 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0312 (-10.1587)
+03/19 04:22:42 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4772 (8.4772)
+03/19 04:22:42 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:22:42 INFO train_distill_dimo.py:667] < PROGRESS: 95.01% | SPEED: 2.924s / step | ETA: 0:24:22 >
+03/19 04:25:09 INFO train_distill_dimo.py:734] Iteration 9550, lr_s=1.05e-06 lr_a=1.05e-06, time=2.75s
+03/19 04:25:09 INFO train_distill_dimo.py:745]     Train H_mean: 10.7500 (10.2200)
+03/19 04:25:09 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1591 (-0.1594)
+03/19 04:25:09 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2120 (0.2484)
+03/19 04:25:09 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1201 (0.1428)
+03/19 04:25:09 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4657 (0.0043)
+03/19 04:25:09 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.7500 (-10.2188)
+03/19 04:25:09 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4779 (8.4781)
+03/19 04:25:09 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:27:36 INFO train_distill_dimo.py:734] Iteration 9600, lr_s=1.04e-06 lr_a=1.04e-06, time=2.78s
+03/19 04:27:36 INFO train_distill_dimo.py:745]     Train H_mean: 10.0000 (10.1263)
+03/19 04:27:36 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1571 (-0.1587)
+03/19 04:27:36 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1976 (0.2465)
+03/19 04:27:36 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1172 (0.1461)
+03/19 04:27:36 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4338 (-0.2517)
+03/19 04:27:36 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.0000 (-10.1325)
+03/19 04:27:36 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4793 (8.4785)
+03/19 04:27:36 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:30:03 INFO train_distill_dimo.py:734] Iteration 9650, lr_s=1.03e-06 lr_a=1.03e-06, time=2.76s
+03/19 04:30:03 INFO train_distill_dimo.py:745]     Train H_mean: 10.5312 (10.2137)
+03/19 04:30:03 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1731 (-0.1724)
+03/19 04:30:03 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2328 (0.2994)
+03/19 04:30:03 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1245 (0.1647)
+03/19 04:30:03 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4514 (-0.3150)
+03/19 04:30:03 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5625 (-10.2150)
+03/19 04:30:03 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4794 (8.4789)
+03/19 04:30:03 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:32:30 INFO train_distill_dimo.py:734] Iteration 9700, lr_s=1.02e-06 lr_a=1.02e-06, time=2.73s
+03/19 04:32:30 INFO train_distill_dimo.py:745]     Train H_mean: 10.5625 (10.1800)
+03/19 04:32:30 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1738 (-0.1740)
+03/19 04:32:30 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2243 (0.2825)
+03/19 04:32:30 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1263 (0.1498)
+03/19 04:32:30 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6571 (0.0788)
+03/19 04:32:30 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5938 (-10.1862)
+03/19 04:32:30 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4759 (8.4779)
+03/19 04:32:30 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:34:57 INFO train_distill_dimo.py:734] Iteration 9750, lr_s=1.02e-06 lr_a=1.02e-06, time=3.69s
+03/19 04:34:57 INFO train_distill_dimo.py:745]     Train H_mean: 10.5000 (10.0509)
+03/19 04:34:57 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1716 (-0.1718)
+03/19 04:34:57 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2468 (0.2915)
+03/19 04:34:57 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1287 (0.1664)
+03/19 04:34:57 INFO train_distill_dimo.py:745]     Train loss_pg: 0.5428 (-0.0066)
+03/19 04:34:57 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5000 (-10.0609)
+03/19 04:34:57 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4771 (8.4778)
+03/19 04:34:57 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:37:23 INFO train_distill_dimo.py:734] Iteration 9800, lr_s=1.01e-06 lr_a=1.01e-06, time=3.20s
+03/19 04:37:23 INFO train_distill_dimo.py:745]     Train H_mean: 10.4688 (10.1687)
+03/19 04:37:23 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1643 (-0.1638)
+03/19 04:37:23 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2266 (0.2575)
+03/19 04:37:23 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1218 (0.1449)
+03/19 04:37:23 INFO train_distill_dimo.py:745]     Train loss_pg: 0.6353 (0.1869)
+03/19 04:37:23 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4688 (-10.1738)
+03/19 04:37:23 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4788 (8.4784)
+03/19 04:37:23 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:39:48 INFO train_distill_dimo.py:734] Iteration 9850, lr_s=1.01e-06 lr_a=1.01e-06, time=3.46s
+03/19 04:39:48 INFO train_distill_dimo.py:745]     Train H_mean: 10.2812 (10.2300)
+03/19 04:39:48 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.2097 (-0.1943)
+03/19 04:39:48 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.1810 (0.2563)
+03/19 04:39:48 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1340 (0.2802)
+03/19 04:39:48 INFO train_distill_dimo.py:745]     Train loss_pg: 0.8290 (-1.0039)
+03/19 04:39:48 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.2812 (-10.2338)
+03/19 04:39:48 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4787 (8.4777)
+03/19 04:39:48 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:42:12 INFO train_distill_dimo.py:734] Iteration 9900, lr_s=1.00e-06 lr_a=1.00e-06, time=2.71s
+03/19 04:42:12 INFO train_distill_dimo.py:745]     Train H_mean: 10.4375 (10.2100)
+03/19 04:42:12 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1905 (-0.1917)
+03/19 04:42:12 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2039 (0.2923)
+03/19 04:42:12 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1172 (0.1538)
+03/19 04:42:12 INFO train_distill_dimo.py:745]     Train loss_pg: 0.9188 (0.5046)
+03/19 04:42:12 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4375 (-10.2212)
+03/19 04:42:12 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4779 (8.4784)
+03/19 04:42:12 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:44:37 INFO train_distill_dimo.py:734] Iteration 9950, lr_s=1.00e-06 lr_a=1.00e-06, time=2.75s
+03/19 04:44:37 INFO train_distill_dimo.py:745]     Train H_mean: 10.5625 (10.1650)
+03/19 04:44:37 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1794 (-0.1797)
+03/19 04:44:37 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2184 (0.3039)
+03/19 04:44:37 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1312 (0.1521)
+03/19 04:44:37 INFO train_distill_dimo.py:745]     Train loss_pg: 0.4721 (0.0211)
+03/19 04:44:37 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.5625 (-10.1738)
+03/19 04:44:37 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4802 (8.4780)
+03/19 04:44:37 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:47:03 INFO train_distill_dimo.py:734] Iteration 10000, lr_s=1.00e-06 lr_a=1.00e-06, time=2.70s
+03/19 04:47:03 INFO train_distill_dimo.py:745]     Train H_mean: 10.4062 (10.2056)
+03/19 04:47:03 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.1784 (-0.1794)
+03/19 04:47:03 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.2134 (0.3865)
+03/19 04:47:03 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.1207 (0.2072)
+03/19 04:47:03 INFO train_distill_dimo.py:745]     Train loss_pg: 0.7458 (-0.3472)
+03/19 04:47:03 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -10.4375 (-10.2094)
+03/19 04:47:03 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4777 (8.4789)
+03/19 04:47:03 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 04:47:03 INFO train_distill_dimo.py:667] < PROGRESS: 100.01% | SPEED: 2.924s / step | ETA: 0:00:00 >
+03/19 04:47:22 INFO train_distill_dimo.py:720] [save] step=10000 → ./experiments/distill_dimo/checkpoints/checkpoint-10000
+03/19 04:47:22 INFO train_distill_dimo.py:734] Iteration 10000, lr_s=1.00e-06 lr_a=1.00e-06, time=2.70s
+03/19 04:47:41 INFO train_distill_dimo.py:720] [save] step=10000 → ./experiments/distill_dimo/checkpoints/checkpoint-final
diff --git a/URSA/experiments/distill_dimo/logs/20260319_161604.log b/URSA/experiments/distill_dimo/logs/20260319_161604.log
new file mode 100644
index 0000000000000000000000000000000000000000..4635f5b1417234301c3cf9b3e8901ecbfa40aec7
--- /dev/null
+++ b/URSA/experiments/distill_dimo/logs/20260319_161604.log
@@ -0,0 +1,72 @@
+03/19 16:16:04 INFO train_distill_dimo.py:905] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 16:16:04 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
diff --git a/URSA/experiments/distill_dimo/logs/20260319_162039.log b/URSA/experiments/distill_dimo/logs/20260319_162039.log
new file mode 100644
index 0000000000000000000000000000000000000000..b9a1e7f54a746428f13e94ac914017c72d6c7d36
--- /dev/null
+++ b/URSA/experiments/distill_dimo/logs/20260319_162039.log
@@ -0,0 +1,72 @@
+03/19 16:20:39 INFO train_distill_dimo.py:905] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 16:20:39 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
diff --git a/URSA/experiments/distill_dimo/logs/20260319_164116.log b/URSA/experiments/distill_dimo/logs/20260319_164116.log
new file mode 100644
index 0000000000000000000000000000000000000000..9dc397d3339fafeeb3c21c4601c5cb52a58a0532
--- /dev/null
+++ b/URSA/experiments/distill_dimo/logs/20260319_164116.log
@@ -0,0 +1,77 @@
+03/19 16:41:16 INFO train_distill_dimo.py:905] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 16:41:16 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/19 16:42:39 INFO train_distill_dimo.py:183] [init] latents_shape=(13,40,64)  N=33280  K=64000  CFG=ON
+03/19 16:42:52 INFO train_distill_dimo.py:295] [init] verified_native_regime=True  geometry=(49×320×512)  teacher_cfg_scale=7.0
+03/19 16:42:52 INFO train_distill_dimo.py:313] [init] student params: 1982.17M
+03/19 16:42:52 INFO train_distill_dimo.py:316] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/19 16:42:52 INFO train_distill_dimo.py:687] [train] Starting from step 0 / 10000
diff --git a/URSA/experiments/distill_dimo/logs/20260319_164915.log b/URSA/experiments/distill_dimo/logs/20260319_164915.log
new file mode 100644
index 0000000000000000000000000000000000000000..cabdbdbda5584310e006f3702cb46ca24a39d215
--- /dev/null
+++ b/URSA/experiments/distill_dimo/logs/20260319_164915.log
@@ -0,0 +1,461 @@
+03/19 16:49:15 INFO train_distill_dimo.py:905] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo
+  log_every: 50
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 16:49:15 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/19 16:50:38 INFO train_distill_dimo.py:183] [init] latents_shape=(5,40,64)  N=12800  K=64000  CFG=ON
+03/19 16:50:54 INFO train_distill_dimo.py:295] [init] verified_native_regime=False  geometry=(17×320×512)  teacher_cfg_scale=7.0
+03/19 16:50:54 INFO train_distill_dimo.py:313] [init] student params: 1982.17M
+03/19 16:50:54 INFO train_distill_dimo.py:316] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/19 16:50:54 INFO train_distill_dimo.py:687] [train] Starting from step 0 / 10000
+03/19 16:58:06 INFO train_distill_dimo.py:768] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=7.76s
+03/19 16:58:06 INFO train_distill_dimo.py:779]     Train H_mean: 2.8594 (3.0294)
+03/19 16:58:06 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.0016 (-0.0020)
+03/19 16:58:06 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.0094 (0.0098)
+03/19 16:58:06 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.0053 (0.0136)
+03/19 16:58:06 INFO train_distill_dimo.py:779]     Train loss_pg: -0.0106 (-0.0366)
+03/19 16:58:06 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -2.8438 (-3.0166)
+03/19 16:58:06 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2587 (9.2600)
+03/19 16:58:06 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.0000)
+03/19 17:04:39 INFO train_distill_dimo.py:768] Iteration 100, lr_s=2.01e-06 lr_a=2.01e-06, time=7.76s
+03/19 17:04:39 INFO train_distill_dimo.py:779]     Train H_mean: 4.3125 (4.3175)
+03/19 17:04:39 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.0110 (-0.0120)
+03/19 17:04:39 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.0144 (0.1573)
+03/19 17:04:39 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.0199 (0.0879)
+03/19 17:04:39 INFO train_distill_dimo.py:779]     Train loss_pg: -0.0380 (-0.1233)
+03/19 17:04:39 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -4.1250 (-4.2559)
+03/19 17:04:39 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2290 (9.2239)
+03/19 17:04:39 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.0200)
+03/19 17:11:12 INFO train_distill_dimo.py:768] Iteration 150, lr_s=3.01e-06 lr_a=3.01e-06, time=8.14s
+03/19 17:11:12 INFO train_distill_dimo.py:779]     Train H_mean: 7.2969 (7.0166)
+03/19 17:11:12 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.0262 (-0.0258)
+03/19 17:11:12 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.0275 (0.0591)
+03/19 17:11:12 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.0311 (0.0540)
+03/19 17:11:12 INFO train_distill_dimo.py:779]     Train loss_pg: -0.0276 (-0.1225)
+03/19 17:11:12 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -7.2188 (-6.9188)
+03/19 17:11:12 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.1063 (8.9088)
+03/19 17:11:12 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.0600)
+03/19 17:17:45 INFO train_distill_dimo.py:768] Iteration 200, lr_s=4.01e-06 lr_a=4.01e-06, time=7.75s
+03/19 17:17:45 INFO train_distill_dimo.py:779]     Train H_mean: 4.0000 (4.5485)
+03/19 17:17:45 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.0710 (-0.0698)
+03/19 17:17:45 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.0371 (0.0923)
+03/19 17:17:45 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.0832 (0.2267)
+03/19 17:17:45 INFO train_distill_dimo.py:779]     Train loss_pg: -0.0184 (-0.9006)
+03/19 17:17:45 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -4.1797 (-5.1238)
+03/19 17:17:45 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.0484 (8.7852)
+03/19 17:17:45 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.0200)
+03/19 17:24:18 INFO train_distill_dimo.py:768] Iteration 250, lr_s=5.00e-06 lr_a=5.00e-06, time=7.75s
+03/19 17:24:18 INFO train_distill_dimo.py:779]     Train H_mean: 5.4375 (5.4980)
+03/19 17:24:18 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.1138 (-0.1132)
+03/19 17:24:18 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.1023 (0.3877)
+03/19 17:24:18 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.1147 (0.2246)
+03/19 17:24:18 INFO train_distill_dimo.py:779]     Train loss_pg: 0.0314 (-0.1165)
+03/19 17:24:18 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -5.7188 (-5.5380)
+03/19 17:24:18 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2524 (9.2532)
+03/19 17:24:18 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.1200)
+03/19 17:30:52 INFO train_distill_dimo.py:768] Iteration 300, lr_s=6.00e-06 lr_a=6.00e-06, time=7.76s
+03/19 17:30:52 INFO train_distill_dimo.py:779]     Train H_mean: 6.6094 (5.7861)
+03/19 17:30:52 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.1145 (-0.1290)
+03/19 17:30:52 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.1552 (0.7962)
+03/19 17:30:52 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.1372 (0.5014)
+03/19 17:30:52 INFO train_distill_dimo.py:779]     Train loss_pg: 0.0296 (-0.6510)
+03/19 17:30:52 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -7.5000 (-6.3627)
+03/19 17:30:52 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2392 (9.2406)
+03/19 17:30:52 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.1800)
+03/19 17:37:25 INFO train_distill_dimo.py:768] Iteration 350, lr_s=7.00e-06 lr_a=7.00e-06, time=7.78s
+03/19 17:37:25 INFO train_distill_dimo.py:779]     Train H_mean: 4.5312 (4.0909)
+03/19 17:37:25 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.2497 (-0.2485)
+03/19 17:37:25 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.2132 (0.9028)
+03/19 17:37:25 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.1886 (0.6320)
+03/19 17:37:25 INFO train_distill_dimo.py:779]     Train loss_pg: 0.2904 (-0.2729)
+03/19 17:37:25 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -4.5938 (-4.1177)
+03/19 17:37:25 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2619 (9.2616)
+03/19 17:37:25 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.1000)
+03/19 17:43:58 INFO train_distill_dimo.py:768] Iteration 400, lr_s=8.00e-06 lr_a=8.00e-06, time=7.76s
+03/19 17:43:58 INFO train_distill_dimo.py:779]     Train H_mean: 7.5000 (7.2131)
+03/19 17:43:58 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.2237 (-0.2248)
+03/19 17:43:58 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.1704 (0.8440)
+03/19 17:43:58 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.1251 (0.3764)
+03/19 17:43:58 INFO train_distill_dimo.py:779]     Train loss_pg: 0.6873 (0.3809)
+03/19 17:43:58 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -7.3906 (-7.2544)
+03/19 17:43:58 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2612 (9.2615)
+03/19 17:43:58 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.1800)
+03/19 17:50:31 INFO train_distill_dimo.py:768] Iteration 450, lr_s=9.00e-06 lr_a=9.00e-06, time=7.75s
+03/19 17:50:31 INFO train_distill_dimo.py:779]     Train H_mean: 7.0000 (6.5016)
+03/19 17:50:31 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.2241 (-0.3828)
+03/19 17:50:31 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.2721 (1.6752)
+03/19 17:50:31 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.2430 (1.6196)
+03/19 17:50:31 INFO train_distill_dimo.py:779]     Train loss_pg: 0.0657 (-9.0251)
+03/19 17:50:31 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -7.4219 (-6.5478)
+03/19 17:50:31 INFO train_distill_dimo.py:779]     Train tok_entropy: 8.9185 (8.6385)
+03/19 17:50:31 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.2000)
+03/19 17:57:03 INFO train_distill_dimo.py:768] Iteration 500, lr_s=1.00e-05 lr_a=1.00e-05, time=8.08s
+03/19 17:57:03 INFO train_distill_dimo.py:779]     Train H_mean: 5.9062 (7.0566)
+03/19 17:57:03 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.8602 (-0.8560)
+03/19 17:57:03 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.1767 (0.8937)
+03/19 17:57:03 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.1838 (1.2013)
+03/19 17:57:03 INFO train_distill_dimo.py:779]     Train loss_pg: 3.7072 (-1.2465)
+03/19 17:57:03 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -5.7812 (-7.0219)
+03/19 17:57:03 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2339 (9.1681)
+03/19 17:57:03 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.2600)
+03/19 17:57:03 INFO train_distill_dimo.py:701] < PROGRESS: 5.01% | SPEED: 7.937s / step | ETA: 20:56:44 >
+03/19 18:03:35 INFO train_distill_dimo.py:768] Iteration 550, lr_s=1.00e-05 lr_a=1.00e-05, time=7.75s
+03/19 18:03:35 INFO train_distill_dimo.py:779]     Train H_mean: 7.2969 (7.6984)
+03/19 18:03:35 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.6165 (-0.6278)
+03/19 18:03:35 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.2970 (1.2176)
+03/19 18:03:35 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.1592 (0.5286)
+03/19 18:03:35 INFO train_distill_dimo.py:779]     Train loss_pg: 3.5843 (2.9728)
+03/19 18:03:35 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -7.6094 (-7.8256)
+03/19 18:03:35 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2532 (9.2424)
+03/19 18:03:35 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.3800)
+03/19 18:10:09 INFO train_distill_dimo.py:768] Iteration 600, lr_s=1.00e-05 lr_a=1.00e-05, time=7.78s
+03/19 18:10:09 INFO train_distill_dimo.py:779]     Train H_mean: 7.7031 (7.7175)
+03/19 18:10:09 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.5455 (-0.5580)
+03/19 18:10:09 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.2949 (0.7836)
+03/19 18:10:09 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.2417 (0.6909)
+03/19 18:10:09 INFO train_distill_dimo.py:779]     Train loss_pg: 2.8776 (0.6671)
+03/19 18:10:09 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -7.7969 (-7.7994)
+03/19 18:10:09 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2261 (9.2291)
+03/19 18:10:09 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.2600)
+03/19 18:16:42 INFO train_distill_dimo.py:768] Iteration 650, lr_s=9.99e-06 lr_a=9.99e-06, time=7.74s
+03/19 18:16:42 INFO train_distill_dimo.py:779]     Train H_mean: 9.3750 (9.0525)
+03/19 18:16:42 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.4772 (-0.4757)
+03/19 18:16:42 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.2537 (1.3278)
+03/19 18:16:42 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.2189 (0.6701)
+03/19 18:16:42 INFO train_distill_dimo.py:779]     Train loss_pg: 2.8049 (1.2257)
+03/19 18:16:42 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -9.5000 (-9.2506)
+03/19 18:16:42 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2693 (9.2664)
+03/19 18:16:42 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.3000)
+03/19 18:23:15 INFO train_distill_dimo.py:768] Iteration 700, lr_s=9.99e-06 lr_a=9.99e-06, time=7.75s
+03/19 18:23:15 INFO train_distill_dimo.py:779]     Train H_mean: 9.6875 (8.7428)
+03/19 18:23:15 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.3889 (-0.3922)
+03/19 18:23:15 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.3234 (1.5320)
+03/19 18:23:15 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.2077 (0.5574)
+03/19 18:23:15 INFO train_distill_dimo.py:779]     Train loss_pg: 1.5258 (1.5522)
+03/19 18:23:15 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -9.7500 (-8.7631)
+03/19 18:23:15 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2721 (9.2733)
+03/19 18:23:15 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.3800)
+03/19 18:29:48 INFO train_distill_dimo.py:768] Iteration 750, lr_s=9.98e-06 lr_a=9.98e-06, time=7.76s
+03/19 18:29:48 INFO train_distill_dimo.py:779]     Train H_mean: 10.2812 (9.5481)
+03/19 18:29:48 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.3250 (-0.3255)
+03/19 18:29:48 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.7215 (1.1445)
+03/19 18:29:48 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.3413 (0.4856)
+03/19 18:29:48 INFO train_distill_dimo.py:779]     Train loss_pg: 1.0390 (0.9162)
+03/19 18:29:48 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.3125 (-9.6062)
+03/19 18:29:48 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2904 (9.2903)
+03/19 18:29:48 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.3800)
+03/19 18:36:22 INFO train_distill_dimo.py:768] Iteration 800, lr_s=9.98e-06 lr_a=9.98e-06, time=7.78s
+03/19 18:36:22 INFO train_distill_dimo.py:779]     Train H_mean: 7.3750 (6.7056)
+03/19 18:36:22 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.2894 (-0.2892)
+03/19 18:36:22 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.7839 (1.9923)
+03/19 18:36:22 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.2249 (0.7812)
+03/19 18:36:22 INFO train_distill_dimo.py:779]     Train loss_pg: 0.4741 (0.5587)
+03/19 18:36:22 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -6.4531 (-6.6833)
+03/19 18:36:22 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2729 (9.2661)
+03/19 18:36:22 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.5000 (0.5000)
+03/19 18:42:55 INFO train_distill_dimo.py:768] Iteration 850, lr_s=9.97e-06 lr_a=9.97e-06, time=8.18s
+03/19 18:42:55 INFO train_distill_dimo.py:779]     Train H_mean: 9.7812 (8.4044)
+03/19 18:42:55 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.2870 (-0.2891)
+03/19 18:42:55 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.4869 (1.0842)
+03/19 18:42:55 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.2853 (0.6019)
+03/19 18:42:55 INFO train_distill_dimo.py:779]     Train loss_pg: 0.2459 (-1.0213)
+03/19 18:42:55 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.0000 (-8.4763)
+03/19 18:42:55 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2736 (9.2732)
+03/19 18:42:55 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.3800)
+03/19 18:49:28 INFO train_distill_dimo.py:768] Iteration 900, lr_s=9.96e-06 lr_a=9.96e-06, time=7.80s
+03/19 18:49:28 INFO train_distill_dimo.py:779]     Train H_mean: 9.2812 (7.4628)
+03/19 18:49:28 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.2953 (-0.3034)
+03/19 18:49:28 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.5380 (1.1995)
+03/19 18:49:28 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.3752 (0.7184)
+03/19 18:49:28 INFO train_distill_dimo.py:779]     Train loss_pg: 0.7723 (-1.0950)
+03/19 18:49:28 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -9.3750 (-7.4798)
+03/19 18:49:28 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2587 (9.2565)
+03/19 18:49:28 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.5200)
+03/19 18:56:02 INFO train_distill_dimo.py:768] Iteration 950, lr_s=9.95e-06 lr_a=9.95e-06, time=7.75s
+03/19 18:56:02 INFO train_distill_dimo.py:779]     Train H_mean: 6.5000 (6.9136)
+03/19 18:56:02 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.9312 (-0.8037)
+03/19 18:56:02 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.7040 (1.2711)
+03/19 18:56:02 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.9207 (2.3447)
+03/19 18:56:02 INFO train_distill_dimo.py:779]     Train loss_pg: 2.3300 (-11.3094)
+03/19 18:56:02 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -6.8125 (-7.4486)
+03/19 18:56:02 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.1172 (9.0072)
+03/19 18:56:02 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.5400)
+03/19 19:02:35 INFO train_distill_dimo.py:768] Iteration 1000, lr_s=9.94e-06 lr_a=9.94e-06, time=7.75s
+03/19 19:02:35 INFO train_distill_dimo.py:779]     Train H_mean: 8.3438 (8.1094)
+03/19 19:02:35 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.7931 (-0.7993)
+03/19 19:02:35 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.6991 (1.4599)
+03/19 19:02:35 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.2905 (0.5846)
+03/19 19:02:35 INFO train_distill_dimo.py:779]     Train loss_pg: 4.6670 (4.2655)
+03/19 19:02:35 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -8.4062 (-8.1131)
+03/19 19:02:35 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2123 (9.1875)
+03/19 19:02:35 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.5000 (0.5000)
+03/19 19:02:35 INFO train_distill_dimo.py:701] < PROGRESS: 10.01% | SPEED: 7.901s / step | ETA: 19:45:11 >
+03/19 19:03:00 INFO train_distill_dimo.py:754] [save] step=1000 → ./experiments/distill_dimo/checkpoints/checkpoint-1000
+03/19 19:09:34 INFO train_distill_dimo.py:768] Iteration 1050, lr_s=9.93e-06 lr_a=9.93e-06, time=7.76s
+03/19 19:09:34 INFO train_distill_dimo.py:779]     Train H_mean: 10.0000 (9.4941)
+03/19 19:09:34 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.5924 (-0.5995)
+03/19 19:09:34 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.0649 (1.4831)
+03/19 19:09:34 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.5877 (0.6828)
+03/19 19:09:34 INFO train_distill_dimo.py:779]     Train loss_pg: 2.9099 (2.5554)
+03/19 19:09:34 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.0312 (-9.5747)
+03/19 19:09:34 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2851 (9.2855)
+03/19 19:09:34 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.6000)
+03/19 19:16:09 INFO train_distill_dimo.py:768] Iteration 1100, lr_s=9.91e-06 lr_a=9.91e-06, time=7.78s
+03/19 19:16:09 INFO train_distill_dimo.py:779]     Train H_mean: 10.0312 (8.7281)
+03/19 19:16:09 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.4945 (-0.4915)
+03/19 19:16:09 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.9715 (1.7120)
+03/19 19:16:09 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.5507 (0.7059)
+03/19 19:16:09 INFO train_distill_dimo.py:779]     Train loss_pg: 2.1091 (1.6679)
+03/19 19:16:09 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.0938 (-8.8691)
+03/19 19:16:09 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2847 (9.2857)
+03/19 19:16:09 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.6400)
+03/19 19:22:43 INFO train_distill_dimo.py:768] Iteration 1150, lr_s=9.90e-06 lr_a=9.90e-06, time=7.76s
+03/19 19:22:43 INFO train_distill_dimo.py:779]     Train H_mean: 9.4062 (8.2672)
+03/19 19:22:43 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.4207 (-0.4213)
+03/19 19:22:43 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.6790 (1.5064)
+03/19 19:22:43 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.3855 (0.6045)
+03/19 19:22:43 INFO train_distill_dimo.py:779]     Train loss_pg: 0.9687 (0.7968)
+03/19 19:22:43 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -9.6875 (-8.3375)
+03/19 19:22:43 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2830 (9.2840)
+03/19 19:22:43 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.4400)
+03/19 19:29:18 INFO train_distill_dimo.py:768] Iteration 1200, lr_s=9.88e-06 lr_a=9.88e-06, time=8.15s
+03/19 19:29:18 INFO train_distill_dimo.py:779]     Train H_mean: 10.3750 (9.5644)
+03/19 19:29:18 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.3776 (-0.3805)
+03/19 19:29:18 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.1127 (1.6176)
+03/19 19:29:18 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.5343 (0.6155)
+03/19 19:29:18 INFO train_distill_dimo.py:779]     Train loss_pg: 0.8713 (0.5784)
+03/19 19:29:18 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.3750 (-9.7384)
+03/19 19:29:18 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.3020 (9.2997)
+03/19 19:29:18 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.5800)
+03/19 19:35:53 INFO train_distill_dimo.py:768] Iteration 1250, lr_s=9.86e-06 lr_a=9.86e-06, time=7.77s
+03/19 19:35:53 INFO train_distill_dimo.py:779]     Train H_mean: 6.2344 (6.4325)
+03/19 19:35:53 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.3720 (-0.3769)
+03/19 19:35:53 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.1400 (1.7296)
+03/19 19:35:53 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.5933 (0.7232)
+03/19 19:35:53 INFO train_distill_dimo.py:779]     Train loss_pg: -0.0565 (-0.2603)
+03/19 19:35:53 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -5.5938 (-6.4717)
+03/19 19:35:53 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2751 (9.2716)
+03/19 19:35:53 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.6600)
+03/19 19:42:29 INFO train_distill_dimo.py:768] Iteration 1300, lr_s=9.84e-06 lr_a=9.84e-06, time=7.74s
+03/19 19:42:29 INFO train_distill_dimo.py:779]     Train H_mean: 7.0312 (6.5680)
+03/19 19:42:29 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.4135 (-0.4131)
+03/19 19:42:29 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.0752 (1.4746)
+03/19 19:42:29 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.6118 (0.7230)
+03/19 19:42:29 INFO train_distill_dimo.py:779]     Train loss_pg: 0.0742 (-0.9657)
+03/19 19:42:29 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -7.2656 (-6.9478)
+03/19 19:42:29 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2594 (9.1774)
+03/19 19:42:29 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 0.0000 (0.4800)
+03/19 19:49:04 INFO train_distill_dimo.py:768] Iteration 1350, lr_s=9.82e-06 lr_a=9.82e-06, time=8.21s
+03/19 19:49:04 INFO train_distill_dimo.py:779]     Train H_mean: 5.5156 (5.7902)
+03/19 19:49:04 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.4340 (-0.4354)
+03/19 19:49:04 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.2474 (1.7728)
+03/19 19:49:04 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.6224 (0.8145)
+03/19 19:49:04 INFO train_distill_dimo.py:779]     Train loss_pg: 0.7235 (0.1374)
+03/19 19:49:04 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -5.4531 (-5.8258)
+03/19 19:49:04 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2657 (8.9922)
+03/19 19:49:04 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.7600)
+03/19 19:55:39 INFO train_distill_dimo.py:768] Iteration 1400, lr_s=9.80e-06 lr_a=9.80e-06, time=7.79s
+03/19 19:55:39 INFO train_distill_dimo.py:779]     Train H_mean: 6.3125 (6.4041)
+03/19 19:55:39 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.4129 (-0.4127)
+03/19 19:55:39 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.1952 (1.3467)
+03/19 19:55:39 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.6213 (0.6020)
+03/19 19:55:39 INFO train_distill_dimo.py:779]     Train loss_pg: 0.2498 (0.0275)
+03/19 19:55:39 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -7.1875 (-6.9209)
+03/19 19:55:39 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2668 (9.1434)
+03/19 19:55:39 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.6200)
+03/19 20:02:13 INFO train_distill_dimo.py:768] Iteration 1450, lr_s=9.78e-06 lr_a=9.78e-06, time=7.73s
+03/19 20:02:13 INFO train_distill_dimo.py:779]     Train H_mean: 8.0938 (7.9216)
+03/19 20:02:13 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.4346 (-0.5879)
+03/19 20:02:13 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.3236 (1.5290)
+03/19 20:02:13 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 1.1871 (2.3815)
+03/19 20:02:13 INFO train_distill_dimo.py:779]     Train loss_pg: -3.4987 (-12.3918)
+03/19 20:02:13 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -8.1875 (-8.0681)
+03/19 20:02:13 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.1865 (8.4484)
+03/19 20:02:13 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.7200)
+03/19 20:08:48 INFO train_distill_dimo.py:768] Iteration 1500, lr_s=9.76e-06 lr_a=9.76e-06, time=7.98s
+03/19 20:08:48 INFO train_distill_dimo.py:779]     Train H_mean: 9.0312 (8.4213)
+03/19 20:08:48 INFO train_distill_dimo.py:779]     Train baseline_ema: -1.7657 (-1.7990)
+03/19 20:08:48 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.7425 (2.1735)
+03/19 20:08:48 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 4.3752 (4.6560)
+03/19 20:08:48 INFO train_distill_dimo.py:779]     Train loss_pg: -18.5390 (-21.3375)
+03/19 20:08:48 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -9.1562 (-8.9637)
+03/19 20:08:48 INFO train_distill_dimo.py:779]     Train tok_entropy: 8.1688 (7.8814)
+03/19 20:08:48 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.7600)
+03/19 20:08:48 INFO train_distill_dimo.py:701] < PROGRESS: 15.01% | SPEED: 7.899s / step | ETA: 18:39:04 >
+03/19 20:15:23 INFO train_distill_dimo.py:768] Iteration 1550, lr_s=9.73e-06 lr_a=9.73e-06, time=8.46s
+03/19 20:15:23 INFO train_distill_dimo.py:779]     Train H_mean: 8.3438 (8.0612)
+03/19 20:15:23 INFO train_distill_dimo.py:779]     Train baseline_ema: -1.8871 (-1.9090)
+03/19 20:15:23 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.8919 (0.9040)
+03/19 20:15:23 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.5923 (0.7512)
+03/19 20:15:23 INFO train_distill_dimo.py:779]     Train loss_pg: 14.0663 (12.1146)
+03/19 20:15:23 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -8.3750 (-8.1931)
+03/19 20:15:23 INFO train_distill_dimo.py:779]     Train tok_entropy: 8.1246 (8.1679)
+03/19 20:15:23 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.7400)
+03/19 20:21:59 INFO train_distill_dimo.py:768] Iteration 1600, lr_s=9.71e-06 lr_a=9.71e-06, time=7.75s
+03/19 20:21:59 INFO train_distill_dimo.py:779]     Train H_mean: 9.0625 (8.8488)
+03/19 20:21:59 INFO train_distill_dimo.py:779]     Train baseline_ema: -1.3301 (-1.3397)
+03/19 20:21:59 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.0172 (1.4044)
+03/19 20:21:59 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.6396 (0.6614)
+03/19 20:21:59 INFO train_distill_dimo.py:779]     Train loss_pg: 8.3236 (7.8903)
+03/19 20:21:59 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -9.3125 (-8.8606)
+03/19 20:21:59 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2595 (9.2547)
+03/19 20:21:59 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.7600)
+03/19 20:28:35 INFO train_distill_dimo.py:768] Iteration 1650, lr_s=9.68e-06 lr_a=9.68e-06, time=7.79s
+03/19 20:28:35 INFO train_distill_dimo.py:779]     Train H_mean: 10.0938 (9.1205)
+03/19 20:28:35 INFO train_distill_dimo.py:779]     Train baseline_ema: -1.0133 (-1.0072)
+03/19 20:28:35 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.4883 (1.6081)
+03/19 20:28:35 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.6902 (0.7345)
+03/19 20:28:35 INFO train_distill_dimo.py:779]     Train loss_pg: 4.8721 (4.3430)
+03/19 20:28:35 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.0938 (-9.1186)
+03/19 20:28:35 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2922 (9.2909)
+03/19 20:28:35 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.6600)
+03/19 20:35:11 INFO train_distill_dimo.py:768] Iteration 1700, lr_s=9.65e-06 lr_a=9.65e-06, time=8.09s
+03/19 20:35:11 INFO train_distill_dimo.py:779]     Train H_mean: 10.2500 (9.5398)
+03/19 20:35:11 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.7928 (-0.7970)
+03/19 20:35:11 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.3444 (1.6916)
+03/19 20:35:11 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.6447 (0.7395)
+03/19 20:35:11 INFO train_distill_dimo.py:779]     Train loss_pg: 3.5237 (2.8513)
+03/19 20:35:11 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.1875 (-9.5988)
+03/19 20:35:11 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2968 (9.2951)
+03/19 20:35:11 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.8400)
+03/19 20:41:47 INFO train_distill_dimo.py:768] Iteration 1750, lr_s=9.62e-06 lr_a=9.62e-06, time=7.76s
+03/19 20:41:47 INFO train_distill_dimo.py:779]     Train H_mean: 9.7812 (9.3438)
+03/19 20:41:47 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.7316 (-0.7237)
+03/19 20:41:47 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.9492 (1.2953)
+03/19 20:41:47 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.6324 (0.7470)
+03/19 20:41:47 INFO train_distill_dimo.py:779]     Train loss_pg: 2.2726 (0.8503)
+03/19 20:41:47 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -9.9375 (-9.3422)
+03/19 20:41:47 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2994 (9.2954)
+03/19 20:41:47 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.7800)
+03/19 20:48:23 INFO train_distill_dimo.py:768] Iteration 1800, lr_s=9.59e-06 lr_a=9.59e-06, time=7.75s
+03/19 20:48:23 INFO train_distill_dimo.py:779]     Train H_mean: 10.0938 (9.7594)
+03/19 20:48:23 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.6202 (-0.6230)
+03/19 20:48:23 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.4082 (1.5517)
+03/19 20:48:23 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.7146 (0.7299)
+03/19 20:48:23 INFO train_distill_dimo.py:779]     Train loss_pg: 1.2075 (1.2249)
+03/19 20:48:23 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.2188 (-9.7575)
+03/19 20:48:23 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.3004 (9.3006)
+03/19 20:48:23 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.7800)
+03/19 20:55:00 INFO train_distill_dimo.py:768] Iteration 1850, lr_s=9.56e-06 lr_a=9.56e-06, time=7.77s
+03/19 20:55:00 INFO train_distill_dimo.py:779]     Train H_mean: 10.1875 (9.8038)
+03/19 20:55:00 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.6102 (-0.6096)
+03/19 20:55:00 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.6247 (1.7309)
+03/19 20:55:00 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.7887 (0.8165)
+03/19 20:55:00 INFO train_distill_dimo.py:779]     Train loss_pg: 0.5235 (-0.4506)
+03/19 20:55:00 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.2812 (-9.8625)
+03/19 20:55:00 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2995 (9.2995)
+03/19 20:55:00 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.9400)
+03/19 21:01:37 INFO train_distill_dimo.py:768] Iteration 1900, lr_s=9.53e-06 lr_a=9.53e-06, time=8.16s
+03/19 21:01:37 INFO train_distill_dimo.py:779]     Train H_mean: 9.8125 (9.3181)
+03/19 21:01:37 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.6233 (-0.6235)
+03/19 21:01:37 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.3972 (1.4972)
+03/19 21:01:37 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.7621 (0.7737)
+03/19 21:01:37 INFO train_distill_dimo.py:779]     Train loss_pg: -0.0261 (-0.5185)
+03/19 21:01:37 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -9.9375 (-9.3591)
+03/19 21:01:37 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2976 (9.2949)
+03/19 21:01:37 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (0.9600)
+03/19 21:08:13 INFO train_distill_dimo.py:768] Iteration 1950, lr_s=9.49e-06 lr_a=9.49e-06, time=7.75s
+03/19 21:08:13 INFO train_distill_dimo.py:779]     Train H_mean: 10.1562 (9.6503)
+03/19 21:08:13 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.6556 (-0.6501)
+03/19 21:08:13 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.4842 (1.6509)
+03/19 21:08:13 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.7199 (0.8750)
+03/19 21:08:13 INFO train_distill_dimo.py:779]     Train loss_pg: 0.7563 (-0.7894)
+03/19 21:08:13 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.0625 (-9.7109)
+03/19 21:08:13 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.3016 (9.2967)
+03/19 21:08:13 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 21:14:50 INFO train_distill_dimo.py:768] Iteration 2000, lr_s=9.46e-06 lr_a=9.46e-06, time=7.76s
+03/19 21:14:50 INFO train_distill_dimo.py:779]     Train H_mean: 10.4062 (10.1762)
+03/19 21:14:50 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.7209 (-0.7108)
+03/19 21:14:50 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.6140 (1.4516)
+03/19 21:14:50 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.7647 (0.7777)
+03/19 21:14:50 INFO train_distill_dimo.py:779]     Train loss_pg: -1.2898 (-1.2132)
+03/19 21:14:50 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.3750 (-10.1644)
+03/19 21:14:50 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.3100 (9.3035)
+03/19 21:14:50 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 21:14:50 INFO train_distill_dimo.py:701] < PROGRESS: 20.01% | SPEED: 7.905s / step | ETA: 17:34:01 >
+03/19 21:15:14 INFO train_distill_dimo.py:754] [save] step=2000 → ./experiments/distill_dimo/checkpoints/checkpoint-2000
+03/19 21:21:50 INFO train_distill_dimo.py:768] Iteration 2050, lr_s=9.42e-06 lr_a=9.42e-06, time=8.14s
+03/19 21:21:50 INFO train_distill_dimo.py:779]     Train H_mean: 9.9375 (9.1247)
+03/19 21:21:50 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.7229 (-0.7270)
+03/19 21:21:50 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 0.9751 (1.1711)
+03/19 21:21:50 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.5867 (0.7700)
+03/19 21:21:50 INFO train_distill_dimo.py:779]     Train loss_pg: 2.9118 (0.8746)
+03/19 21:21:50 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.0625 (-9.1072)
+03/19 21:21:50 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.2656 (9.2509)
+03/19 21:21:50 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (1.0000)
+03/19 21:28:25 INFO train_distill_dimo.py:768] Iteration 2100, lr_s=9.39e-06 lr_a=9.39e-06, time=7.75s
+03/19 21:28:25 INFO train_distill_dimo.py:779]     Train H_mean: 9.9062 (8.6525)
+03/19 21:28:25 INFO train_distill_dimo.py:779]     Train baseline_ema: -0.7041 (-0.7059)
+03/19 21:28:25 INFO train_distill_dimo.py:779]     Train loss_aux_cond: 1.3843 (1.4905)
+03/19 21:28:25 INFO train_distill_dimo.py:779]     Train loss_kd_cond: 0.7626 (0.8242)
+03/19 21:28:25 INFO train_distill_dimo.py:779]     Train loss_pg: 0.5517 (-0.8516)
+03/19 21:28:25 INFO train_distill_dimo.py:779]     Train mean_logp_tok: -10.0312 (-8.8609)
+03/19 21:28:25 INFO train_distill_dimo.py:779]     Train tok_entropy: 9.3001 (9.2931)
+03/19 21:28:25 INFO train_distill_dimo.py:779]     Train use_guided_ratio: 1.0000 (1.0000)
diff --git a/URSA/experiments/distill_dimo_v2/config.yaml b/URSA/experiments/distill_dimo_v2/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..9f9d3750041716d9699d3efcc297b14b8213ab96
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v2/config.yaml
@@ -0,0 +1,68 @@
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v2
+  log_every: 10
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
diff --git a/URSA/experiments/distill_dimo_v2/logs/20260319_215451.log b/URSA/experiments/distill_dimo_v2/logs/20260319_215451.log
new file mode 100644
index 0000000000000000000000000000000000000000..72e884a0cc2c92bca22daf1bbe41b739cc4d95dc
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v2/logs/20260319_215451.log
@@ -0,0 +1,76 @@
+03/19 21:54:51 INFO train_distill_dimo.py:1049] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v2
+  log_every: 10
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 21:54:51 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/19 21:56:14 INFO train_distill_dimo.py:183] [init] latents_shape=(13,40,64)  N=33280  K=64000  CFG=ON
+03/19 21:56:25 INFO train_distill_dimo.py:295] [init] verified_native_regime=True  geometry=(49×320×512)  teacher_cfg_scale=7.0
+03/19 21:56:25 INFO train_distill_dimo.py:313] [init] student params: 1982.17M
+03/19 21:56:25 INFO train_distill_dimo.py:316] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/19 21:56:25 INFO train_distill_dimo.py:831] [train] Starting from step 0 / 10000
diff --git a/URSA/experiments/distill_dimo_v2/logs/20260319_220008.log b/URSA/experiments/distill_dimo_v2/logs/20260319_220008.log
new file mode 100644
index 0000000000000000000000000000000000000000..17a82890818bff62f30b56186d8ad9208b996af7
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v2/logs/20260319_220008.log
@@ -0,0 +1,76 @@
+03/19 22:00:08 INFO train_distill_dimo.py:1049] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v2
+  log_every: 10
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 22:00:08 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/19 22:01:35 INFO train_distill_dimo.py:183] [init] latents_shape=(13,40,64)  N=33280  K=64000  CFG=ON
+03/19 22:02:32 INFO train_distill_dimo.py:295] [init] verified_native_regime=True  geometry=(49×320×512)  teacher_cfg_scale=7.0
+03/19 22:02:32 INFO train_distill_dimo.py:313] [init] student params: 0.00M
+03/19 22:02:32 INFO train_distill_dimo.py:316] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/19 22:02:32 INFO train_distill_dimo.py:831] [train] Starting from step 0 / 10000
diff --git a/URSA/experiments/distill_dimo_v2/logs/20260319_220819.log b/URSA/experiments/distill_dimo_v2/logs/20260319_220819.log
new file mode 100644
index 0000000000000000000000000000000000000000..13256e0b0c5ecd07a3518beb3080a98776d66a34
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v2/logs/20260319_220819.log
@@ -0,0 +1,76 @@
+03/19 22:08:19 INFO train_distill_dimo.py:1060] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v2
+  log_every: 10
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 22:08:19 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/19 22:09:44 INFO train_distill_dimo.py:183] [init] latents_shape=(13,40,64)  N=33280  K=64000  CFG=ON
+03/19 22:10:09 INFO train_distill_dimo.py:295] [init] verified_native_regime=True  geometry=(49×320×512)  teacher_cfg_scale=7.0
+03/19 22:10:09 INFO train_distill_dimo.py:313] [init] student params: 0.00M
+03/19 22:10:09 INFO train_distill_dimo.py:316] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/19 22:10:09 INFO train_distill_dimo.py:842] [train] Starting from step 0 / 10000
diff --git a/URSA/experiments/distill_dimo_v2/logs/20260319_221518.log b/URSA/experiments/distill_dimo_v2/logs/20260319_221518.log
new file mode 100644
index 0000000000000000000000000000000000000000..a9d7606ee6b131027a611b942e9e91a325141091
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v2/logs/20260319_221518.log
@@ -0,0 +1,71 @@
+03/19 22:15:18 INFO train_distill_dimo.py:1218] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v2
+  log_every: 10
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 22:15:18 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
diff --git a/URSA/experiments/distill_dimo_v2/logs/20260319_221954.log b/URSA/experiments/distill_dimo_v2/logs/20260319_221954.log
new file mode 100644
index 0000000000000000000000000000000000000000..47ad974179816123118758aef00ace3290830e04
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v2/logs/20260319_221954.log
@@ -0,0 +1,76 @@
+03/19 22:19:54 INFO train_distill_dimo.py:1218] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v2
+  log_every: 10
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 22:19:54 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/19 22:21:15 INFO train_distill_dimo.py:183] [init] latents_shape=(13,40,64)  N=33280  K=64000  CFG=ON
+03/19 22:21:41 INFO train_distill_dimo.py:295] [init] verified_native_regime=True  geometry=(49×320×512)  teacher_cfg_scale=7.0
+03/19 22:21:41 INFO train_distill_dimo.py:313] [init] student params: 0.00M
+03/19 22:21:41 INFO train_distill_dimo.py:316] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/19 22:21:41 INFO train_distill_dimo.py:1000] [train] Starting from step 0 / 10000
diff --git a/URSA/experiments/distill_dimo_v2/logs/20260319_222452.log b/URSA/experiments/distill_dimo_v2/logs/20260319_222452.log
new file mode 100644
index 0000000000000000000000000000000000000000..8379f52f54ef4d90ec6f36e54f603a637f90ab54
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v2/logs/20260319_222452.log
@@ -0,0 +1,85 @@
+03/19 22:24:52 INFO train_distill_dimo.py:1218] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v2
+  log_every: 10
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 22:24:52 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/19 22:26:16 INFO train_distill_dimo.py:183] [init] latents_shape=(13,40,64)  N=33280  K=64000  CFG=ON
+03/19 22:26:42 INFO train_distill_dimo.py:295] [init] verified_native_regime=True  geometry=(49×320×512)  teacher_cfg_scale=7.0
+03/19 22:26:42 INFO train_distill_dimo.py:313] [init] student params: 0.00M
+03/19 22:26:42 INFO train_distill_dimo.py:316] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/19 22:26:42 INFO train_distill_dimo.py:1000] [train] Starting from step 0 / 10000
+03/19 22:34:18 INFO train_distill_dimo.py:1081] Iteration 10, lr_s=2.10e-07 lr_a=2.10e-07, time=39.43s
+03/19 22:34:18 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 22:34:18 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 22:34:18 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 10.1250 (9.1281)
+03/19 22:34:18 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0025 (0.0027)
+03/19 22:34:18 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0000 (0.0000)
+03/19 22:34:18 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 22:34:18 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.8741 (9.8449)
+03/19 22:34:18 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
diff --git a/URSA/experiments/distill_dimo_v2/logs/20260319_224018.log b/URSA/experiments/distill_dimo_v2/logs/20260319_224018.log
new file mode 100644
index 0000000000000000000000000000000000000000..c4aedeaa986f26f17e6064904e75e90f6154f522
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v2/logs/20260319_224018.log
@@ -0,0 +1,1159 @@
+03/19 22:40:18 INFO train_distill_dimo.py:1218] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v2
+  log_every: 10
+  save_every: 1000
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/19 22:40:18 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/19 22:41:51 INFO train_distill_dimo.py:183] [init] latents_shape=(13,40,64)  N=33280  K=64000  CFG=ON
+03/19 22:42:21 INFO train_distill_dimo.py:295] [init] verified_native_regime=True  geometry=(49×320×512)  teacher_cfg_scale=7.0
+03/19 22:42:21 INFO train_distill_dimo.py:313] [init] student params: 0.00M
+03/19 22:42:21 INFO train_distill_dimo.py:316] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/19 22:42:21 INFO train_distill_dimo.py:1000] [train] Starting from step 0 / 10000
+03/19 22:50:27 INFO train_distill_dimo.py:1081] Iteration 10, lr_s=2.10e-07 lr_a=2.10e-07, time=43.91s
+03/19 22:50:27 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 22:50:27 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 22:50:27 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 10.0938 (9.1156)
+03/19 22:50:27 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0024 (0.0027)
+03/19 22:50:27 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0000 (0.0000)
+03/19 22:50:27 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 22:50:27 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.8735 (9.8445)
+03/19 22:50:27 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/19 22:58:12 INFO train_distill_dimo.py:1081] Iteration 20, lr_s=4.10e-07 lr_a=4.10e-07, time=51.29s
+03/19 22:58:12 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 22:58:12 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 22:58:12 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.6719 (7.3625)
+03/19 22:58:12 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0033 (0.0034)
+03/19 22:58:12 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0000 (0.0007)
+03/19 22:58:12 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 22:58:12 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.8745 (9.8733)
+03/19 22:58:12 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/19 23:05:49 INFO train_distill_dimo.py:1081] Iteration 30, lr_s=6.09e-07 lr_a=6.09e-07, time=45.81s
+03/19 23:05:49 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 23:05:49 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 23:05:49 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.5156 (6.5250)
+03/19 23:05:49 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0045 (0.0039)
+03/19 23:05:49 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0004 (0.0065)
+03/19 23:05:49 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 23:05:49 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.8228 (9.8079)
+03/19 23:05:49 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/19 23:13:13 INFO train_distill_dimo.py:1081] Iteration 40, lr_s=8.09e-07 lr_a=8.09e-07, time=44.93s
+03/19 23:13:13 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 23:13:13 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 23:13:13 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.1797 (5.6296)
+03/19 23:13:13 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0051 (0.0047)
+03/19 23:13:13 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0155 (0.1099)
+03/19 23:13:13 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 23:13:13 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.8918 (9.7548)
+03/19 23:13:13 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/19 23:20:44 INFO train_distill_dimo.py:1081] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=42.23s
+03/19 23:20:44 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 23:20:44 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 23:20:44 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 9.4688 (8.1391)
+03/19 23:20:44 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0080 (0.0110)
+03/19 23:20:44 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0000 (0.0053)
+03/19 23:20:44 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 23:20:44 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.8681 (9.8427)
+03/19 23:20:44 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/19 23:28:13 INFO train_distill_dimo.py:1081] Iteration 60, lr_s=1.21e-06 lr_a=1.21e-06, time=44.71s
+03/19 23:28:13 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 23:28:13 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 23:28:13 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.9219 (6.4961)
+03/19 23:28:13 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0084 (0.0266)
+03/19 23:28:13 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0009 (0.0939)
+03/19 23:28:13 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 23:28:13 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.7152 (9.7023)
+03/19 23:28:13 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/19 23:35:46 INFO train_distill_dimo.py:1081] Iteration 70, lr_s=1.41e-06 lr_a=1.41e-06, time=43.27s
+03/19 23:35:46 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 23:35:46 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 23:35:46 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.5156 (5.1241)
+03/19 23:35:46 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0102 (0.0117)
+03/19 23:35:46 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0044 (0.0758)
+03/19 23:35:46 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 23:35:46 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.5759 (9.5430)
+03/19 23:35:46 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/19 23:43:28 INFO train_distill_dimo.py:1081] Iteration 80, lr_s=1.61e-06 lr_a=1.61e-06, time=50.35s
+03/19 23:43:28 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 23:43:28 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 23:43:28 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 0.4082 (2.3667)
+03/19 23:43:28 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0437 (0.0786)
+03/19 23:43:28 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0576 (0.0823)
+03/19 23:43:28 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 23:43:28 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.0318 (7.9406)
+03/19 23:43:28 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/19 23:51:07 INFO train_distill_dimo.py:1081] Iteration 90, lr_s=1.81e-06 lr_a=1.81e-06, time=43.07s
+03/19 23:51:07 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 23:51:07 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 23:51:07 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 2.9453 (2.6166)
+03/19 23:51:07 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0349 (0.0626)
+03/19 23:51:07 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0094 (0.0174)
+03/19 23:51:07 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 23:51:07 INFO train_distill_dimo.py:1092]     Train tok_entropy: 5.2207 (5.3351)
+03/19 23:51:07 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/19 23:58:40 INFO train_distill_dimo.py:1081] Iteration 100, lr_s=2.01e-06 lr_a=2.01e-06, time=45.09s
+03/19 23:58:40 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/19 23:58:40 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/19 23:58:40 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 1.3672 (2.4464)
+03/19 23:58:40 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0258 (0.1049)
+03/19 23:58:40 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0188 (0.0698)
+03/19 23:58:40 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/19 23:58:40 INFO train_distill_dimo.py:1092]     Train tok_entropy: 6.9802 (6.9547)
+03/19 23:58:40 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/19 23:58:40 INFO train_distill_dimo.py:1014] < PROGRESS: 1.01% | SPEED: 45.784s / step | ETA: 5 days, 5:54:23 >
+03/20 00:06:15 INFO train_distill_dimo.py:1081] Iteration 110, lr_s=2.21e-06 lr_a=2.21e-06, time=49.92s
+03/20 00:06:15 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 00:06:15 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 00:06:15 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.0781 (3.9566)
+03/20 00:06:15 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0319 (0.1435)
+03/20 00:06:15 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0054 (0.0457)
+03/20 00:06:15 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 00:06:15 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.6731 (7.6430)
+03/20 00:06:15 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 00:13:55 INFO train_distill_dimo.py:1081] Iteration 120, lr_s=2.41e-06 lr_a=2.41e-06, time=44.47s
+03/20 00:13:55 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 00:13:55 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 00:13:55 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.1250 (3.0232)
+03/20 00:13:55 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0558 (1.0996)
+03/20 00:13:55 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0082 (0.0552)
+03/20 00:13:55 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 00:13:55 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.0244 (7.0505)
+03/20 00:13:55 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 00:21:50 INFO train_distill_dimo.py:1081] Iteration 130, lr_s=2.61e-06 lr_a=2.61e-06, time=50.58s
+03/20 00:21:50 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 00:21:50 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 00:21:50 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.5234 (3.7231)
+03/20 00:21:50 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0442 (0.0540)
+03/20 00:21:50 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0108 (0.0140)
+03/20 00:21:50 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 00:21:50 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.3584 (7.4204)
+03/20 00:21:50 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 00:29:29 INFO train_distill_dimo.py:1081] Iteration 140, lr_s=2.81e-06 lr_a=2.81e-06, time=49.88s
+03/20 00:29:29 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 00:29:29 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 00:29:29 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.3750 (5.2609)
+03/20 00:29:29 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0390 (0.0496)
+03/20 00:29:29 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0005 (0.0174)
+03/20 00:29:29 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 00:29:29 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.4919 (7.4777)
+03/20 00:29:29 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 00:37:06 INFO train_distill_dimo.py:1081] Iteration 150, lr_s=3.01e-06 lr_a=3.01e-06, time=51.41s
+03/20 00:37:06 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 00:37:06 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 00:37:06 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.3047 (3.9193)
+03/20 00:37:06 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0768 (0.0836)
+03/20 00:37:06 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0074 (0.0237)
+03/20 00:37:06 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 00:37:06 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.0645 (7.2162)
+03/20 00:37:06 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 00:44:39 INFO train_distill_dimo.py:1081] Iteration 160, lr_s=3.21e-06 lr_a=3.21e-06, time=44.95s
+03/20 00:44:39 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 00:44:39 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 00:44:39 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.2344 (6.0094)
+03/20 00:44:39 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0520 (0.1233)
+03/20 00:44:39 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0006 (0.0018)
+03/20 00:44:39 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 00:44:39 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.5389 (7.5861)
+03/20 00:44:39 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 00:52:04 INFO train_distill_dimo.py:1081] Iteration 170, lr_s=3.41e-06 lr_a=3.41e-06, time=43.67s
+03/20 00:52:04 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 00:52:04 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 00:52:04 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.3750 (3.7849)
+03/20 00:52:04 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0413 (0.0858)
+03/20 00:52:04 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0127 (0.0359)
+03/20 00:52:04 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 00:52:04 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.7201 (7.7738)
+03/20 00:52:04 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 00:59:31 INFO train_distill_dimo.py:1081] Iteration 180, lr_s=3.61e-06 lr_a=3.61e-06, time=43.14s
+03/20 00:59:31 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 00:59:31 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 00:59:31 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.3594 (4.1201)
+03/20 00:59:31 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0478 (0.0698)
+03/20 00:59:31 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0088 (0.0353)
+03/20 00:59:31 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 00:59:31 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.5406 (7.6323)
+03/20 00:59:31 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 01:07:11 INFO train_distill_dimo.py:1081] Iteration 190, lr_s=3.81e-06 lr_a=3.81e-06, time=45.34s
+03/20 01:07:11 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 01:07:11 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 01:07:11 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.6719 (5.1287)
+03/20 01:07:11 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0634 (0.0971)
+03/20 01:07:11 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0014 (0.0166)
+03/20 01:07:11 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 01:07:11 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.5447 (7.5273)
+03/20 01:07:11 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 01:14:44 INFO train_distill_dimo.py:1081] Iteration 200, lr_s=4.01e-06 lr_a=4.01e-06, time=45.00s
+03/20 01:14:44 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 01:14:44 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 01:14:44 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.8750 (3.7681)
+03/20 01:14:44 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0423 (0.0855)
+03/20 01:14:44 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0051 (0.0142)
+03/20 01:14:44 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 01:14:44 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.7103 (7.8444)
+03/20 01:14:44 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 01:14:44 INFO train_distill_dimo.py:1014] < PROGRESS: 2.01% | SPEED: 45.714s / step | ETA: 5 days, 4:26:33 >
+03/20 01:22:21 INFO train_distill_dimo.py:1081] Iteration 210, lr_s=4.21e-06 lr_a=4.21e-06, time=50.24s
+03/20 01:22:21 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 01:22:21 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 01:22:21 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 2.8984 (3.0091)
+03/20 01:22:21 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0608 (0.1012)
+03/20 01:22:21 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0195 (0.0391)
+03/20 01:22:21 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 01:22:21 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.1691 (8.0695)
+03/20 01:22:21 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 01:29:54 INFO train_distill_dimo.py:1081] Iteration 220, lr_s=4.41e-06 lr_a=4.41e-06, time=45.74s
+03/20 01:29:54 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 01:29:54 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 01:29:54 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.5000 (4.6383)
+03/20 01:29:54 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0871 (0.2439)
+03/20 01:29:54 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0138 (0.0536)
+03/20 01:29:54 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 01:29:54 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.3433 (7.1499)
+03/20 01:29:54 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 01:37:15 INFO train_distill_dimo.py:1081] Iteration 230, lr_s=4.61e-06 lr_a=4.61e-06, time=44.29s
+03/20 01:37:15 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 01:37:15 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 01:37:15 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.9688 (4.1270)
+03/20 01:37:15 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.2464 (0.2936)
+03/20 01:37:15 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0049 (0.0197)
+03/20 01:37:15 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 01:37:15 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.0116 (7.2751)
+03/20 01:37:15 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 01:44:51 INFO train_distill_dimo.py:1081] Iteration 240, lr_s=4.81e-06 lr_a=4.81e-06, time=45.52s
+03/20 01:44:51 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 01:44:51 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 01:44:51 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.6250 (5.8015)
+03/20 01:44:51 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.0969 (0.1973)
+03/20 01:44:51 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0008 (0.0330)
+03/20 01:44:51 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 01:44:51 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.5558 (7.3534)
+03/20 01:44:51 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 01:52:27 INFO train_distill_dimo.py:1081] Iteration 250, lr_s=5.00e-06 lr_a=5.00e-06, time=42.07s
+03/20 01:52:27 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 01:52:27 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 01:52:27 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.0469 (6.1578)
+03/20 01:52:27 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3943 (1.8963)
+03/20 01:52:27 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0053 (0.0207)
+03/20 01:52:27 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 01:52:27 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.7281 (8.3933)
+03/20 01:52:27 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 01:59:53 INFO train_distill_dimo.py:1081] Iteration 260, lr_s=5.20e-06 lr_a=5.20e-06, time=45.09s
+03/20 01:59:53 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 01:59:53 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 01:59:53 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.6406 (5.1093)
+03/20 01:59:53 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4350 (0.9890)
+03/20 01:59:53 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0243 (0.0464)
+03/20 01:59:53 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 01:59:53 INFO train_distill_dimo.py:1092]     Train tok_entropy: 6.3573 (5.9346)
+03/20 01:59:53 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 02:07:20 INFO train_distill_dimo.py:1081] Iteration 270, lr_s=5.40e-06 lr_a=5.40e-06, time=49.73s
+03/20 02:07:20 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 02:07:20 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 02:07:20 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.1797 (2.7005)
+03/20 02:07:20 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3771 (1.0916)
+03/20 02:07:20 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0301 (0.0656)
+03/20 02:07:20 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 02:07:20 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.2516 (7.0733)
+03/20 02:07:20 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 02:14:57 INFO train_distill_dimo.py:1081] Iteration 280, lr_s=5.60e-06 lr_a=5.60e-06, time=48.03s
+03/20 02:14:57 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 02:14:57 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 02:14:57 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.5469 (5.2379)
+03/20 02:14:57 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.1857 (0.7894)
+03/20 02:14:57 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0072 (0.0337)
+03/20 02:14:57 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 02:14:57 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.4421 (8.3623)
+03/20 02:14:57 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 02:22:36 INFO train_distill_dimo.py:1081] Iteration 290, lr_s=5.80e-06 lr_a=5.80e-06, time=44.85s
+03/20 02:22:36 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 02:22:36 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 02:22:36 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 2.8125 (4.2211)
+03/20 02:22:36 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.8818 (2.1146)
+03/20 02:22:36 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0331 (0.0764)
+03/20 02:22:36 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 02:22:36 INFO train_distill_dimo.py:1092]     Train tok_entropy: 6.6995 (6.7708)
+03/20 02:22:36 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 02:30:04 INFO train_distill_dimo.py:1081] Iteration 300, lr_s=6.00e-06 lr_a=6.00e-06, time=45.21s
+03/20 02:30:04 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 02:30:04 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 02:30:04 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.1094 (4.4225)
+03/20 02:30:04 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.2510 (0.8048)
+03/20 02:30:04 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0029 (0.0624)
+03/20 02:30:04 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 02:30:04 INFO train_distill_dimo.py:1092]     Train tok_entropy: 6.5438 (6.6848)
+03/20 02:30:04 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 02:30:04 INFO train_distill_dimo.py:1014] < PROGRESS: 3.01% | SPEED: 45.543s / step | ETA: 5 days, 2:42:50 >
+03/20 02:37:53 INFO train_distill_dimo.py:1081] Iteration 310, lr_s=6.20e-06 lr_a=6.20e-06, time=43.88s
+03/20 02:37:53 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 02:37:53 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 02:37:53 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.0781 (5.8598)
+03/20 02:37:53 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.1059 (0.1561)
+03/20 02:37:53 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0008 (0.0130)
+03/20 02:37:53 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 02:37:53 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.3003 (7.5906)
+03/20 02:37:53 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 02:45:35 INFO train_distill_dimo.py:1081] Iteration 320, lr_s=6.40e-06 lr_a=6.40e-06, time=45.21s
+03/20 02:45:35 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 02:45:35 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 02:45:35 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 8.0312 (7.9156)
+03/20 02:45:35 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.1288 (0.6134)
+03/20 02:45:35 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0003 (0.0032)
+03/20 02:45:35 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 02:45:35 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.9505 (8.8283)
+03/20 02:45:35 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 02:53:12 INFO train_distill_dimo.py:1081] Iteration 330, lr_s=6.60e-06 lr_a=6.60e-06, time=45.79s
+03/20 02:53:12 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 02:53:12 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 02:53:12 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.8594 (6.0594)
+03/20 02:53:12 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.1402 (0.8018)
+03/20 02:53:12 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0072 (0.0453)
+03/20 02:53:12 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 02:53:12 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.7755 (8.9330)
+03/20 02:53:12 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 03:00:46 INFO train_distill_dimo.py:1081] Iteration 340, lr_s=6.80e-06 lr_a=6.80e-06, time=44.60s
+03/20 03:00:46 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 03:00:46 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 03:00:46 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.6094 (4.0483)
+03/20 03:00:46 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.1044 (0.6817)
+03/20 03:00:46 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0116 (0.0444)
+03/20 03:00:46 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 03:00:46 INFO train_distill_dimo.py:1092]     Train tok_entropy: 6.8156 (7.0495)
+03/20 03:00:46 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 03:08:20 INFO train_distill_dimo.py:1081] Iteration 350, lr_s=7.00e-06 lr_a=7.00e-06, time=43.51s
+03/20 03:08:20 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 03:08:20 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 03:08:20 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.6562 (5.7943)
+03/20 03:08:20 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3340 (0.3292)
+03/20 03:08:20 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0060 (0.0092)
+03/20 03:08:20 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 03:08:20 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.8221 (8.1235)
+03/20 03:08:20 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 03:15:56 INFO train_distill_dimo.py:1081] Iteration 360, lr_s=7.20e-06 lr_a=7.20e-06, time=46.40s
+03/20 03:15:56 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 03:15:56 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 03:15:56 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.1406 (5.1776)
+03/20 03:15:56 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3505 (0.5587)
+03/20 03:15:56 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0037 (0.0373)
+03/20 03:15:56 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 03:15:56 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.3450 (7.2776)
+03/20 03:15:56 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 03:23:22 INFO train_distill_dimo.py:1081] Iteration 370, lr_s=7.40e-06 lr_a=7.40e-06, time=43.70s
+03/20 03:23:22 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 03:23:22 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 03:23:22 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 8.0938 (7.5500)
+03/20 03:23:22 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4798 (0.9818)
+03/20 03:23:22 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0004 (0.0190)
+03/20 03:23:22 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 03:23:22 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.6113 (8.3422)
+03/20 03:23:22 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 03:30:58 INFO train_distill_dimo.py:1081] Iteration 380, lr_s=7.60e-06 lr_a=7.60e-06, time=48.79s
+03/20 03:30:58 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 03:30:58 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 03:30:58 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.7266 (4.1433)
+03/20 03:30:58 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5491 (0.6932)
+03/20 03:30:58 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0140 (0.0398)
+03/20 03:30:58 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 03:30:58 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.8568 (7.9581)
+03/20 03:30:58 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 03:38:17 INFO train_distill_dimo.py:1081] Iteration 390, lr_s=7.80e-06 lr_a=7.80e-06, time=41.55s
+03/20 03:38:17 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 03:38:17 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 03:38:17 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 1.7539 (3.1141)
+03/20 03:38:17 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3121 (1.0718)
+03/20 03:38:17 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0503 (0.0952)
+03/20 03:38:17 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 03:38:17 INFO train_distill_dimo.py:1092]     Train tok_entropy: 6.8012 (6.3875)
+03/20 03:38:17 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 03:45:40 INFO train_distill_dimo.py:1081] Iteration 400, lr_s=8.00e-06 lr_a=8.00e-06, time=44.93s
+03/20 03:45:40 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 03:45:40 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 03:45:40 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.7031 (6.6437)
+03/20 03:45:40 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.1420 (0.2098)
+03/20 03:45:40 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0007 (0.0067)
+03/20 03:45:40 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 03:45:40 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.0228 (8.5761)
+03/20 03:45:40 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 03:45:40 INFO train_distill_dimo.py:1014] < PROGRESS: 4.01% | SPEED: 45.496s / step | ETA: 5 days, 1:19:22 >
+03/20 03:53:31 INFO train_distill_dimo.py:1081] Iteration 410, lr_s=8.20e-06 lr_a=8.20e-06, time=43.76s
+03/20 03:53:32 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 03:53:32 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 03:53:32 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.7656 (5.1445)
+03/20 03:53:32 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.1607 (0.2926)
+03/20 03:53:32 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0025 (0.0109)
+03/20 03:53:32 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 03:53:32 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.3283 (7.3340)
+03/20 03:53:32 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 04:01:01 INFO train_distill_dimo.py:1081] Iteration 420, lr_s=8.40e-06 lr_a=8.40e-06, time=43.97s
+03/20 04:01:01 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 04:01:01 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 04:01:01 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.4375 (5.3688)
+03/20 04:01:01 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3055 (0.4355)
+03/20 04:01:01 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0021 (0.0330)
+03/20 04:01:01 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 04:01:01 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.7046 (8.5106)
+03/20 04:01:01 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 04:08:51 INFO train_distill_dimo.py:1081] Iteration 430, lr_s=8.60e-06 lr_a=8.60e-06, time=48.80s
+03/20 04:08:51 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 04:08:51 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 04:08:51 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 8.1875 (7.2070)
+03/20 04:08:51 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.6433 (1.8750)
+03/20 04:08:51 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0004 (0.0332)
+03/20 04:08:51 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 04:08:51 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.6339 (8.1303)
+03/20 04:08:51 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 04:16:40 INFO train_distill_dimo.py:1081] Iteration 440, lr_s=8.80e-06 lr_a=8.80e-06, time=48.92s
+03/20 04:16:40 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 04:16:40 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 04:16:40 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.2891 (4.3883)
+03/20 04:16:40 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.2214 (1.4408)
+03/20 04:16:40 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0106 (0.0407)
+03/20 04:16:40 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 04:16:40 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.5702 (7.4421)
+03/20 04:16:40 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 04:24:12 INFO train_distill_dimo.py:1081] Iteration 450, lr_s=9.00e-06 lr_a=9.00e-06, time=46.80s
+03/20 04:24:12 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 04:24:12 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 04:24:12 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 8.2656 (6.4789)
+03/20 04:24:12 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.6236 (1.5710)
+03/20 04:24:12 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0012 (0.1003)
+03/20 04:24:12 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 04:24:12 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.1042 (8.9764)
+03/20 04:24:12 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 04:31:38 INFO train_distill_dimo.py:1081] Iteration 460, lr_s=9.20e-06 lr_a=9.20e-06, time=41.94s
+03/20 04:31:38 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 04:31:38 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 04:31:38 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.6094 (5.8539)
+03/20 04:31:38 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.9359 (1.5168)
+03/20 04:31:38 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0027 (0.0606)
+03/20 04:31:38 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 04:31:38 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.1762 (7.6886)
+03/20 04:31:38 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 04:39:13 INFO train_distill_dimo.py:1081] Iteration 470, lr_s=9.40e-06 lr_a=9.40e-06, time=45.58s
+03/20 04:39:13 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 04:39:13 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 04:39:13 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.6953 (5.4742)
+03/20 04:39:13 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.2317 (1.2014)
+03/20 04:39:13 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0313 (0.0828)
+03/20 04:39:13 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 04:39:13 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.0399 (7.9026)
+03/20 04:39:13 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 04:46:56 INFO train_distill_dimo.py:1081] Iteration 480, lr_s=9.60e-06 lr_a=9.60e-06, time=53.04s
+03/20 04:46:56 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 04:46:56 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 04:46:56 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.0938 (6.7719)
+03/20 04:46:56 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.2428 (0.9207)
+03/20 04:46:56 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0010 (0.0094)
+03/20 04:46:56 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 04:46:56 INFO train_distill_dimo.py:1092]     Train tok_entropy: 6.4978 (6.5766)
+03/20 04:46:56 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 04:54:46 INFO train_distill_dimo.py:1081] Iteration 490, lr_s=9.80e-06 lr_a=9.80e-06, time=43.33s
+03/20 04:54:46 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 04:54:46 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 04:54:46 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.2500 (7.4234)
+03/20 04:54:46 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.2046 (0.5504)
+03/20 04:54:46 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0031 (0.0415)
+03/20 04:54:46 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 04:54:46 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.8304 (9.6891)
+03/20 04:54:46 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 05:02:46 INFO train_distill_dimo.py:1081] Iteration 500, lr_s=1.00e-05 lr_a=1.00e-05, time=46.37s
+03/20 05:02:46 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 05:02:46 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 05:02:46 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.6797 (4.3512)
+03/20 05:02:46 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.2134 (0.2420)
+03/20 05:02:46 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0062 (0.0093)
+03/20 05:02:46 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 05:02:46 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.7650 (7.9889)
+03/20 05:02:46 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 05:02:46 INFO train_distill_dimo.py:1014] < PROGRESS: 5.01% | SPEED: 45.648s / step | ETA: 5 days, 0:27:35 >
+03/20 05:10:31 INFO train_distill_dimo.py:1081] Iteration 510, lr_s=1.00e-05 lr_a=1.00e-05, time=46.33s
+03/20 05:10:31 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 05:10:31 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 05:10:31 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.4531 (6.0531)
+03/20 05:10:31 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5274 (1.8262)
+03/20 05:10:31 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0045 (0.0276)
+03/20 05:10:31 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 05:10:31 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.9632 (8.8461)
+03/20 05:10:31 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 05:18:09 INFO train_distill_dimo.py:1081] Iteration 520, lr_s=1.00e-05 lr_a=1.00e-05, time=45.33s
+03/20 05:18:09 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 05:18:09 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 05:18:09 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.2031 (4.9529)
+03/20 05:18:09 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.9471 (2.4811)
+03/20 05:18:09 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0573 (0.0784)
+03/20 05:18:09 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 05:18:09 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.0564 (6.6203)
+03/20 05:18:09 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 05:25:59 INFO train_distill_dimo.py:1081] Iteration 530, lr_s=1.00e-05 lr_a=1.00e-05, time=43.86s
+03/20 05:25:59 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 05:25:59 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 05:25:59 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.7500 (4.7043)
+03/20 05:25:59 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.2891 (2.0371)
+03/20 05:25:59 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0336 (0.1065)
+03/20 05:25:59 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 05:25:59 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.2046 (8.1271)
+03/20 05:25:59 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 05:33:42 INFO train_distill_dimo.py:1081] Iteration 540, lr_s=1.00e-05 lr_a=1.00e-05, time=46.11s
+03/20 05:33:42 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 05:33:42 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 05:33:42 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.1406 (6.6063)
+03/20 05:33:42 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.8766 (1.1864)
+03/20 05:33:42 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0019 (0.0303)
+03/20 05:33:42 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 05:33:42 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.3780 (9.0423)
+03/20 05:33:42 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 05:36:47 WARNING train_distill_dimo.py:958] [COLLAPSE] step=544  tok_H=1.651  init=9.865  ratio=0.17 < 0.2. Try increasing lambda_ent.
+03/20 05:37:33 WARNING train_distill_dimo.py:958] [COLLAPSE] step=545  tok_H=1.352  init=9.865  ratio=0.14 < 0.2. Try increasing lambda_ent.
+03/20 05:41:11 INFO train_distill_dimo.py:1081] Iteration 550, lr_s=1.00e-05 lr_a=1.00e-05, time=43.10s
+03/20 05:41:11 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 05:41:11 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 05:41:11 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.6172 (4.2750)
+03/20 05:41:11 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.8143 (0.9196)
+03/20 05:41:11 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.1458 (0.1507)
+03/20 05:41:11 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 05:41:11 INFO train_distill_dimo.py:1092]     Train tok_entropy: 2.9295 (2.9032)
+03/20 05:41:11 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 05:48:39 INFO train_distill_dimo.py:1081] Iteration 560, lr_s=1.00e-05 lr_a=1.00e-05, time=44.45s
+03/20 05:48:39 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 05:48:39 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 05:48:39 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.8750 (4.9148)
+03/20 05:48:39 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3990 (0.5319)
+03/20 05:48:39 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0148 (0.0236)
+03/20 05:48:39 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 05:48:39 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.9413 (7.9273)
+03/20 05:48:39 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 05:56:27 INFO train_distill_dimo.py:1081] Iteration 570, lr_s=1.00e-05 lr_a=1.00e-05, time=45.03s
+03/20 05:56:27 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 05:56:27 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 05:56:27 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.4219 (6.1572)
+03/20 05:56:27 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3066 (0.3371)
+03/20 05:56:27 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0005 (0.0406)
+03/20 05:56:27 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 05:56:27 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.3249 (9.1676)
+03/20 05:56:27 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 06:04:19 INFO train_distill_dimo.py:1081] Iteration 580, lr_s=1.00e-05 lr_a=1.00e-05, time=44.13s
+03/20 06:04:19 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 06:04:19 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 06:04:19 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.6641 (4.0944)
+03/20 06:04:19 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4150 (1.1870)
+03/20 06:04:19 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0068 (0.0357)
+03/20 06:04:19 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 06:04:19 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.4119 (7.4306)
+03/20 06:04:19 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 06:11:51 INFO train_distill_dimo.py:1081] Iteration 590, lr_s=1.00e-05 lr_a=1.00e-05, time=47.88s
+03/20 06:11:51 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 06:11:51 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 06:11:51 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 8.2656 (6.4578)
+03/20 06:11:51 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3498 (0.7396)
+03/20 06:11:51 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0011 (0.0417)
+03/20 06:11:51 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 06:11:51 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.3980 (8.9929)
+03/20 06:11:51 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 06:19:25 INFO train_distill_dimo.py:1081] Iteration 600, lr_s=1.00e-05 lr_a=1.00e-05, time=46.51s
+03/20 06:19:25 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 06:19:25 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 06:19:25 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.0938 (6.4938)
+03/20 06:19:25 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.2773 (0.3312)
+03/20 06:19:25 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0016 (0.0017)
+03/20 06:19:25 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 06:19:25 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.9258 (8.0706)
+03/20 06:19:25 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 06:19:25 INFO train_distill_dimo.py:1014] < PROGRESS: 6.01% | SPEED: 45.706s / step | ETA: 4 days, 23:20:38 >
+03/20 06:27:16 INFO train_distill_dimo.py:1081] Iteration 610, lr_s=1.00e-05 lr_a=1.00e-05, time=46.09s
+03/20 06:27:16 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 06:27:16 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 06:27:16 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.7188 (4.4984)
+03/20 06:27:16 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.6340 (2.4365)
+03/20 06:27:16 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0086 (0.0483)
+03/20 06:27:16 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 06:27:16 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.3494 (7.5088)
+03/20 06:27:16 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 06:35:02 INFO train_distill_dimo.py:1081] Iteration 620, lr_s=1.00e-05 lr_a=1.00e-05, time=45.86s
+03/20 06:35:02 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 06:35:02 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 06:35:02 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.4062 (4.3618)
+03/20 06:35:02 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.5705 (2.3021)
+03/20 06:35:02 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0502 (0.1104)
+03/20 06:35:02 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 06:35:02 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.5119 (7.7178)
+03/20 06:35:02 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 06:42:33 INFO train_distill_dimo.py:1081] Iteration 630, lr_s=1.00e-05 lr_a=1.00e-05, time=46.06s
+03/20 06:42:33 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 06:42:33 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 06:42:33 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.7969 (6.2258)
+03/20 06:42:33 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.4636 (2.4499)
+03/20 06:42:33 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0336 (0.0558)
+03/20 06:42:33 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 06:42:33 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.6257 (7.4070)
+03/20 06:42:33 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 06:50:03 INFO train_distill_dimo.py:1081] Iteration 640, lr_s=1.00e-05 lr_a=1.00e-05, time=46.68s
+03/20 06:50:03 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 06:50:03 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 06:50:03 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.0938 (6.7109)
+03/20 06:50:03 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.7840 (1.7361)
+03/20 06:50:03 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0068 (0.0947)
+03/20 06:50:03 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 06:50:03 INFO train_distill_dimo.py:1092]     Train tok_entropy: 6.4080 (6.7110)
+03/20 06:50:03 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 06:57:48 INFO train_distill_dimo.py:1081] Iteration 650, lr_s=9.99e-06 lr_a=9.99e-06, time=43.01s
+03/20 06:57:48 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 06:57:48 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 06:57:48 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.3594 (4.3203)
+03/20 06:57:48 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5477 (1.0930)
+03/20 06:57:48 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0215 (0.0388)
+03/20 06:57:48 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 06:57:48 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.6050 (7.1186)
+03/20 06:57:48 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 07:05:33 INFO train_distill_dimo.py:1081] Iteration 660, lr_s=9.99e-06 lr_a=9.99e-06, time=46.65s
+03/20 07:05:33 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 07:05:33 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 07:05:33 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.9219 (7.4141)
+03/20 07:05:33 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.3242 (1.6282)
+03/20 07:05:33 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0036 (0.0128)
+03/20 07:05:33 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 07:05:33 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.8901 (7.6236)
+03/20 07:05:33 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 07:13:24 INFO train_distill_dimo.py:1081] Iteration 670, lr_s=9.99e-06 lr_a=9.99e-06, time=48.25s
+03/20 07:13:24 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 07:13:24 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 07:13:24 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.8281 (5.6023)
+03/20 07:13:24 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.2213 (0.5886)
+03/20 07:13:24 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0054 (0.0145)
+03/20 07:13:24 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 07:13:24 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.2291 (8.3127)
+03/20 07:13:24 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 07:21:04 INFO train_distill_dimo.py:1081] Iteration 680, lr_s=9.99e-06 lr_a=9.99e-06, time=48.30s
+03/20 07:21:04 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 07:21:04 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 07:21:04 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.2500 (5.1688)
+03/20 07:21:04 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4157 (0.4494)
+03/20 07:21:04 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0043 (0.0254)
+03/20 07:21:04 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 07:21:04 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.0253 (7.9552)
+03/20 07:21:04 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 07:28:48 INFO train_distill_dimo.py:1081] Iteration 690, lr_s=9.99e-06 lr_a=9.99e-06, time=44.78s
+03/20 07:28:48 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 07:28:48 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 07:28:48 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.9844 (5.8153)
+03/20 07:28:48 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.1241 (0.2178)
+03/20 07:28:48 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0042 (0.0227)
+03/20 07:28:48 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 07:28:48 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.4566 (9.3083)
+03/20 07:28:48 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 07:36:30 INFO train_distill_dimo.py:1081] Iteration 700, lr_s=9.99e-06 lr_a=9.99e-06, time=45.43s
+03/20 07:36:30 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 07:36:30 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 07:36:30 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.5625 (6.8672)
+03/20 07:36:30 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.8371 (2.4035)
+03/20 07:36:30 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0117 (0.0252)
+03/20 07:36:30 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 07:36:30 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.0826 (8.0974)
+03/20 07:36:30 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 07:36:30 INFO train_distill_dimo.py:1014] < PROGRESS: 7.01% | SPEED: 45.783s / step | ETA: 4 days, 22:16:23 >
+03/20 07:44:02 INFO train_distill_dimo.py:1081] Iteration 710, lr_s=9.99e-06 lr_a=9.99e-06, time=42.11s
+03/20 07:44:02 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 07:44:02 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 07:44:02 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.7969 (8.6219)
+03/20 07:44:02 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.5695 (1.8063)
+03/20 07:44:02 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0045 (0.0131)
+03/20 07:44:02 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 07:44:02 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.3550 (8.6805)
+03/20 07:44:02 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 07:51:33 INFO train_distill_dimo.py:1081] Iteration 720, lr_s=9.99e-06 lr_a=9.99e-06, time=46.42s
+03/20 07:51:33 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 07:51:33 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 07:51:33 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.1406 (7.2875)
+03/20 07:51:33 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.3737 (3.3281)
+03/20 07:51:33 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0028 (0.0194)
+03/20 07:51:33 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 07:51:33 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.0435 (8.7915)
+03/20 07:51:33 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 07:59:04 INFO train_distill_dimo.py:1081] Iteration 730, lr_s=9.99e-06 lr_a=9.99e-06, time=46.73s
+03/20 07:59:04 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 07:59:04 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 07:59:04 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 8.9062 (7.0617)
+03/20 07:59:04 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.9041 (1.5704)
+03/20 07:59:04 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0012 (0.0377)
+03/20 07:59:04 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 07:59:04 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.6597 (8.3444)
+03/20 07:59:04 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 08:06:40 INFO train_distill_dimo.py:1081] Iteration 740, lr_s=9.99e-06 lr_a=9.99e-06, time=45.62s
+03/20 08:06:40 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 08:06:40 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 08:06:40 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.6875 (4.6527)
+03/20 08:06:40 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5487 (0.6811)
+03/20 08:06:40 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0117 (0.0497)
+03/20 08:06:40 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 08:06:40 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.5398 (7.3326)
+03/20 08:06:40 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 08:14:13 INFO train_distill_dimo.py:1081] Iteration 750, lr_s=9.98e-06 lr_a=9.98e-06, time=43.74s
+03/20 08:14:13 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 08:14:13 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 08:14:13 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.5156 (3.4293)
+03/20 08:14:13 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.2523 (0.8903)
+03/20 08:14:13 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0352 (0.0932)
+03/20 08:14:13 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 08:14:13 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.0481 (7.7327)
+03/20 08:14:13 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 08:22:06 INFO train_distill_dimo.py:1081] Iteration 760, lr_s=9.98e-06 lr_a=9.98e-06, time=45.40s
+03/20 08:22:06 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 08:22:06 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 08:22:06 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.9062 (6.2584)
+03/20 08:22:06 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.8390 (1.1006)
+03/20 08:22:06 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0008 (0.0450)
+03/20 08:22:06 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 08:22:06 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.0641 (7.9423)
+03/20 08:22:06 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 08:29:44 INFO train_distill_dimo.py:1081] Iteration 770, lr_s=9.98e-06 lr_a=9.98e-06, time=44.34s
+03/20 08:29:44 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 08:29:44 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 08:29:44 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.6953 (4.4810)
+03/20 08:29:44 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4162 (1.1980)
+03/20 08:29:44 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0084 (0.0633)
+03/20 08:29:44 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 08:29:44 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.9048 (8.5006)
+03/20 08:29:44 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 08:37:22 INFO train_distill_dimo.py:1081] Iteration 780, lr_s=9.98e-06 lr_a=9.98e-06, time=43.07s
+03/20 08:37:22 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 08:37:22 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 08:37:22 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.4219 (4.3943)
+03/20 08:37:22 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4938 (0.6721)
+03/20 08:37:22 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0165 (0.0186)
+03/20 08:37:22 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 08:37:22 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.0469 (7.9743)
+03/20 08:37:22 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 08:44:54 INFO train_distill_dimo.py:1081] Iteration 790, lr_s=9.98e-06 lr_a=9.98e-06, time=49.54s
+03/20 08:44:54 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 08:44:54 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 08:44:54 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.8594 (6.3426)
+03/20 08:44:54 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.0938 (2.2673)
+03/20 08:44:54 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0112 (0.0663)
+03/20 08:44:54 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 08:44:54 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.3006 (8.1253)
+03/20 08:44:54 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 08:52:29 INFO train_distill_dimo.py:1081] Iteration 800, lr_s=9.98e-06 lr_a=9.98e-06, time=45.69s
+03/20 08:52:29 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 08:52:29 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 08:52:29 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.4375 (6.1172)
+03/20 08:52:29 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3129 (0.3995)
+03/20 08:52:29 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0033 (0.0081)
+03/20 08:52:29 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 08:52:29 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.5986 (8.6224)
+03/20 08:52:29 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 08:52:29 INFO train_distill_dimo.py:1014] < PROGRESS: 8.01% | SPEED: 45.759s / step | ETA: 4 days, 20:56:19 >
+03/20 09:00:19 INFO train_distill_dimo.py:1081] Iteration 810, lr_s=9.98e-06 lr_a=9.98e-06, time=47.59s
+03/20 09:00:19 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 09:00:19 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 09:00:19 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.1406 (5.7375)
+03/20 09:00:19 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.7521 (0.8459)
+03/20 09:00:19 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0137 (0.0318)
+03/20 09:00:19 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 09:00:19 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.8988 (7.9650)
+03/20 09:00:19 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 09:08:08 INFO train_distill_dimo.py:1081] Iteration 820, lr_s=9.97e-06 lr_a=9.97e-06, time=50.54s
+03/20 09:08:08 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 09:08:08 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 09:08:08 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.6250 (4.5797)
+03/20 09:08:08 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5366 (0.5633)
+03/20 09:08:08 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0146 (0.0551)
+03/20 09:08:08 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 09:08:08 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.0599 (7.4238)
+03/20 09:08:08 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 09:15:38 INFO train_distill_dimo.py:1081] Iteration 830, lr_s=9.97e-06 lr_a=9.97e-06, time=44.51s
+03/20 09:15:38 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 09:15:38 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 09:15:38 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.4297 (4.2780)
+03/20 09:15:38 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.1787 (0.5813)
+03/20 09:15:38 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0152 (0.1192)
+03/20 09:15:38 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 09:15:38 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.2975 (7.9413)
+03/20 09:15:38 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 09:23:04 INFO train_distill_dimo.py:1081] Iteration 840, lr_s=9.97e-06 lr_a=9.97e-06, time=43.48s
+03/20 09:23:04 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 09:23:04 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 09:23:04 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.5938 (6.3120)
+03/20 09:23:04 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4676 (0.7151)
+03/20 09:23:04 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0033 (0.0390)
+03/20 09:23:04 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 09:23:04 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.5826 (9.4504)
+03/20 09:23:04 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 09:30:35 INFO train_distill_dimo.py:1081] Iteration 850, lr_s=9.97e-06 lr_a=9.97e-06, time=45.10s
+03/20 09:30:35 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 09:30:35 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 09:30:35 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.7656 (4.4451)
+03/20 09:30:35 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3835 (0.5490)
+03/20 09:30:35 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0307 (0.0756)
+03/20 09:30:35 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 09:30:35 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.6687 (7.9881)
+03/20 09:30:35 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 09:38:19 INFO train_distill_dimo.py:1081] Iteration 860, lr_s=9.97e-06 lr_a=9.97e-06, time=48.74s
+03/20 09:38:19 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 09:38:19 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 09:38:19 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.6406 (4.8004)
+03/20 09:38:19 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.6404 (1.1778)
+03/20 09:38:19 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0318 (0.0758)
+03/20 09:38:19 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 09:38:19 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.1179 (8.5812)
+03/20 09:38:19 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 09:46:01 INFO train_distill_dimo.py:1081] Iteration 870, lr_s=9.97e-06 lr_a=9.97e-06, time=44.49s
+03/20 09:46:01 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 09:46:01 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 09:46:01 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.4375 (5.1137)
+03/20 09:46:01 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5083 (0.5072)
+03/20 09:46:01 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0020 (0.0365)
+03/20 09:46:01 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 09:46:01 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.5980 (9.1134)
+03/20 09:46:01 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 09:53:59 INFO train_distill_dimo.py:1081] Iteration 880, lr_s=9.96e-06 lr_a=9.96e-06, time=46.62s
+03/20 09:53:59 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 09:53:59 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 09:53:59 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.6719 (5.7859)
+03/20 09:53:59 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3009 (0.5515)
+03/20 09:53:59 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0012 (0.0255)
+03/20 09:53:59 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 09:53:59 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.3248 (7.9320)
+03/20 09:53:59 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 10:01:39 INFO train_distill_dimo.py:1081] Iteration 890, lr_s=9.96e-06 lr_a=9.96e-06, time=47.56s
+03/20 10:01:39 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 10:01:39 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 10:01:39 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 0.5122 (2.8565)
+03/20 10:01:39 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.9262 (2.1685)
+03/20 10:01:39 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0838 (0.1559)
+03/20 10:01:39 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 10:01:39 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.1982 (8.0559)
+03/20 10:01:39 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 10:09:28 INFO train_distill_dimo.py:1081] Iteration 900, lr_s=9.96e-06 lr_a=9.96e-06, time=43.88s
+03/20 10:09:28 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 10:09:28 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 10:09:28 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.3750 (4.8260)
+03/20 10:09:28 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4784 (0.5777)
+03/20 10:09:28 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0109 (0.0253)
+03/20 10:09:28 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 10:09:28 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.3214 (7.5245)
+03/20 10:09:28 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 10:09:28 INFO train_distill_dimo.py:1014] < PROGRESS: 9.01% | SPEED: 45.807s / step | ETA: 4 days, 19:47:22 >
+03/20 10:17:01 INFO train_distill_dimo.py:1081] Iteration 910, lr_s=9.96e-06 lr_a=9.96e-06, time=45.91s
+03/20 10:17:01 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 10:17:01 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 10:17:01 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.7812 (5.3645)
+03/20 10:17:01 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4561 (0.7248)
+03/20 10:17:01 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0102 (0.0723)
+03/20 10:17:01 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 10:17:01 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.3459 (9.0930)
+03/20 10:17:01 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 10:24:45 INFO train_distill_dimo.py:1081] Iteration 920, lr_s=9.96e-06 lr_a=9.96e-06, time=46.15s
+03/20 10:24:45 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 10:24:45 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 10:24:45 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.5938 (5.4211)
+03/20 10:24:45 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4631 (0.5637)
+03/20 10:24:45 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0077 (0.0295)
+03/20 10:24:45 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 10:24:45 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.0667 (8.0249)
+03/20 10:24:45 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 10:32:24 INFO train_distill_dimo.py:1081] Iteration 930, lr_s=9.95e-06 lr_a=9.95e-06, time=46.77s
+03/20 10:32:24 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 10:32:24 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 10:32:24 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.7656 (5.2809)
+03/20 10:32:24 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.8512 (2.4156)
+03/20 10:32:24 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0551 (0.1151)
+03/20 10:32:24 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 10:32:24 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.5530 (7.4917)
+03/20 10:32:24 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.7000)
+03/20 10:39:58 INFO train_distill_dimo.py:1081] Iteration 940, lr_s=9.95e-06 lr_a=9.95e-06, time=47.52s
+03/20 10:39:58 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 10:39:58 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 10:39:58 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.3281 (5.0777)
+03/20 10:39:58 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.7315 (2.3507)
+03/20 10:39:58 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0712 (0.1117)
+03/20 10:39:58 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 10:39:58 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.8787 (8.0363)
+03/20 10:39:58 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 10:47:31 INFO train_distill_dimo.py:1081] Iteration 950, lr_s=9.95e-06 lr_a=9.95e-06, time=43.85s
+03/20 10:47:31 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 10:47:31 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 10:47:31 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.2578 (4.9879)
+03/20 10:47:31 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.7196 (1.4278)
+03/20 10:47:31 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0378 (0.1300)
+03/20 10:47:31 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 10:47:31 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.4776 (8.3451)
+03/20 10:47:31 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 10:55:27 INFO train_distill_dimo.py:1081] Iteration 960, lr_s=9.95e-06 lr_a=9.95e-06, time=47.30s
+03/20 10:55:27 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 10:55:27 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 10:55:27 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.0000 (5.8023)
+03/20 10:55:27 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4052 (0.6305)
+03/20 10:55:27 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0032 (0.0283)
+03/20 10:55:27 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 10:55:27 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.0225 (8.9379)
+03/20 10:55:27 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 11:02:58 INFO train_distill_dimo.py:1081] Iteration 970, lr_s=9.95e-06 lr_a=9.95e-06, time=45.20s
+03/20 11:02:58 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 11:02:58 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 11:02:58 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.5469 (5.0616)
+03/20 11:02:58 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.0118 (1.5468)
+03/20 11:02:58 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0093 (0.1281)
+03/20 11:02:58 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 11:02:58 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.2521 (9.0324)
+03/20 11:02:58 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 11:10:43 INFO train_distill_dimo.py:1081] Iteration 980, lr_s=9.94e-06 lr_a=9.94e-06, time=53.88s
+03/20 11:10:43 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 11:10:43 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 11:10:43 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.1250 (6.2578)
+03/20 11:10:43 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.6576 (1.8272)
+03/20 11:10:43 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0049 (0.0572)
+03/20 11:10:43 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 11:10:43 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.1971 (8.3028)
+03/20 11:10:43 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.8000)
+03/20 11:18:26 INFO train_distill_dimo.py:1081] Iteration 990, lr_s=9.94e-06 lr_a=9.94e-06, time=48.07s
+03/20 11:18:26 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 11:18:26 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 11:18:26 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 8.8438 (7.5625)
+03/20 11:18:26 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.6074 (1.2890)
+03/20 11:18:26 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0020 (0.0169)
+03/20 11:18:26 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 11:18:26 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.4057 (8.4536)
+03/20 11:18:26 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 11:26:05 INFO train_distill_dimo.py:1081] Iteration 1000, lr_s=9.94e-06 lr_a=9.94e-06, time=45.79s
+03/20 11:26:05 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 11:26:05 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 11:26:05 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.0703 (4.3781)
+03/20 11:26:05 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.1531 (1.0424)
+03/20 11:26:05 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.1395 (0.1688)
+03/20 11:26:05 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 11:26:05 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.5133 (9.3862)
+03/20 11:26:05 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 11:26:05 INFO train_distill_dimo.py:1014] < PROGRESS: 10.01% | SPEED: 45.823s / step | ETA: 4 days, 18:33:30 >
+03/20 11:26:12 INFO train_distill_dimo.py:1067] [save] step=1000 → ./experiments/distill_dimo_v2/checkpoints/checkpoint-1000
+03/20 11:33:53 INFO train_distill_dimo.py:1081] Iteration 1010, lr_s=9.94e-06 lr_a=9.94e-06, time=45.60s
+03/20 11:33:53 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 11:33:53 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 11:33:53 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.5625 (5.8995)
+03/20 11:33:53 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.7658 (1.0402)
+03/20 11:33:53 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0030 (0.0534)
+03/20 11:33:53 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 11:33:53 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.5361 (9.4274)
+03/20 11:33:53 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 11:41:19 INFO train_distill_dimo.py:1081] Iteration 1020, lr_s=9.93e-06 lr_a=9.93e-06, time=43.81s
+03/20 11:41:19 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 11:41:19 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 11:41:19 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.3906 (3.9713)
+03/20 11:41:19 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.6369 (0.8400)
+03/20 11:41:19 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0100 (0.0928)
+03/20 11:41:19 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 11:41:19 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.9391 (8.0048)
+03/20 11:41:19 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 11:48:58 INFO train_distill_dimo.py:1081] Iteration 1030, lr_s=9.93e-06 lr_a=9.93e-06, time=42.98s
+03/20 11:48:58 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 11:48:58 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 11:48:58 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.3594 (6.7859)
+03/20 11:48:58 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.9339 (1.3926)
+03/20 11:48:58 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0018 (0.0190)
+03/20 11:48:58 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 11:48:58 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.7365 (8.5551)
+03/20 11:48:58 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 11:56:51 INFO train_distill_dimo.py:1081] Iteration 1040, lr_s=9.93e-06 lr_a=9.93e-06, time=48.63s
+03/20 11:56:51 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 11:56:51 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 11:56:51 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.6250 (6.0859)
+03/20 11:56:51 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5444 (2.1542)
+03/20 11:56:51 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0014 (0.0435)
+03/20 11:56:51 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 11:56:51 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.9377 (8.0005)
+03/20 11:56:51 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 12:04:52 INFO train_distill_dimo.py:1081] Iteration 1050, lr_s=9.93e-06 lr_a=9.93e-06, time=45.04s
+03/20 12:04:52 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 12:04:52 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 12:04:52 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.3438 (5.9625)
+03/20 12:04:52 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3637 (0.4197)
+03/20 12:04:52 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0031 (0.0221)
+03/20 12:04:52 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 12:04:52 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.9339 (8.0623)
+03/20 12:04:52 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 12:12:56 INFO train_distill_dimo.py:1081] Iteration 1060, lr_s=9.92e-06 lr_a=9.92e-06, time=49.81s
+03/20 12:12:56 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 12:12:56 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 12:12:56 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.2500 (5.7762)
+03/20 12:12:56 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.6880 (0.7280)
+03/20 12:12:56 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0054 (0.0306)
+03/20 12:12:56 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 12:12:56 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.7084 (7.5786)
+03/20 12:12:56 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 12:20:30 INFO train_distill_dimo.py:1081] Iteration 1070, lr_s=9.92e-06 lr_a=9.92e-06, time=43.99s
+03/20 12:20:30 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 12:20:30 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 12:20:30 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 6.8906 (5.7896)
+03/20 12:20:30 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5714 (1.1943)
+03/20 12:20:30 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0047 (0.0272)
+03/20 12:20:30 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 12:20:30 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.9214 (8.6871)
+03/20 12:20:30 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 12:28:01 INFO train_distill_dimo.py:1081] Iteration 1080, lr_s=9.92e-06 lr_a=9.92e-06, time=46.62s
+03/20 12:28:01 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 12:28:01 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 12:28:01 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 8.1094 (6.0746)
+03/20 12:28:01 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.7606 (1.1832)
+03/20 12:28:01 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0066 (0.0533)
+03/20 12:28:01 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 12:28:01 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.3266 (8.3646)
+03/20 12:28:01 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.7000)
+03/20 12:35:39 INFO train_distill_dimo.py:1081] Iteration 1090, lr_s=9.91e-06 lr_a=9.91e-06, time=43.36s
+03/20 12:35:39 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 12:35:39 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 12:35:39 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.7969 (7.2898)
+03/20 12:35:39 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3975 (0.6985)
+03/20 12:35:39 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0009 (0.0038)
+03/20 12:35:39 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 12:35:39 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.8028 (8.8324)
+03/20 12:35:39 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 12:43:08 INFO train_distill_dimo.py:1081] Iteration 1100, lr_s=9.91e-06 lr_a=9.91e-06, time=41.96s
+03/20 12:43:08 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 12:43:08 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 12:43:08 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.8281 (5.9391)
+03/20 12:43:08 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 1.1167 (2.7393)
+03/20 12:43:08 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0160 (0.0468)
+03/20 12:43:08 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 12:43:08 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.9153 (7.6421)
+03/20 12:43:08 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.9000)
+03/20 12:43:08 INFO train_distill_dimo.py:1014] < PROGRESS: 11.01% | SPEED: 45.854s / step | ETA: 4 days, 17:21:40 >
+03/20 12:50:44 INFO train_distill_dimo.py:1081] Iteration 1110, lr_s=9.91e-06 lr_a=9.91e-06, time=47.90s
+03/20 12:50:44 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 12:50:44 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 12:50:44 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.9688 (6.6579)
+03/20 12:50:44 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5865 (0.6479)
+03/20 12:50:44 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0006 (0.0489)
+03/20 12:50:44 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 12:50:44 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.0647 (9.0656)
+03/20 12:50:44 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 12:58:14 INFO train_distill_dimo.py:1081] Iteration 1120, lr_s=9.91e-06 lr_a=9.91e-06, time=44.62s
+03/20 12:58:14 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 12:58:14 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 12:58:14 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.9688 (5.2848)
+03/20 12:58:14 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5073 (0.8357)
+03/20 12:58:14 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0087 (0.0478)
+03/20 12:58:14 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 12:58:14 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.3752 (7.6048)
+03/20 12:58:14 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 13:05:58 INFO train_distill_dimo.py:1081] Iteration 1130, lr_s=9.90e-06 lr_a=9.90e-06, time=44.03s
+03/20 13:05:58 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 13:05:58 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 13:05:58 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.5938 (5.6664)
+03/20 13:05:58 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.4055 (0.5582)
+03/20 13:05:58 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0061 (0.0174)
+03/20 13:05:58 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 13:05:58 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.3620 (9.0372)
+03/20 13:05:58 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 13:13:49 INFO train_distill_dimo.py:1081] Iteration 1140, lr_s=9.90e-06 lr_a=9.90e-06, time=47.43s
+03/20 13:13:49 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 13:13:49 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 13:13:49 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 0.5840 (2.8208)
+03/20 13:13:49 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.5955 (0.8647)
+03/20 13:13:49 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0438 (0.0361)
+03/20 13:13:49 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 13:13:49 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.7405 (7.9315)
+03/20 13:13:49 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 13:21:32 INFO train_distill_dimo.py:1081] Iteration 1150, lr_s=9.90e-06 lr_a=9.90e-06, time=45.72s
+03/20 13:21:32 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 13:21:32 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 13:21:32 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 4.2422 (4.3033)
+03/20 13:21:32 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3567 (0.4591)
+03/20 13:21:32 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0163 (0.0415)
+03/20 13:21:32 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 13:21:32 INFO train_distill_dimo.py:1092]     Train tok_entropy: 7.5565 (7.7674)
+03/20 13:21:32 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 13:29:16 INFO train_distill_dimo.py:1081] Iteration 1160, lr_s=9.89e-06 lr_a=9.89e-06, time=47.55s
+03/20 13:29:16 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 13:29:16 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 13:29:16 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 3.8750 (3.7139)
+03/20 13:29:16 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.6659 (0.8775)
+03/20 13:29:16 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0125 (0.0457)
+03/20 13:29:16 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 13:29:16 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.0413 (9.0254)
+03/20 13:29:16 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 13:36:56 INFO train_distill_dimo.py:1081] Iteration 1170, lr_s=9.89e-06 lr_a=9.89e-06, time=44.05s
+03/20 13:36:56 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 13:36:56 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 13:36:56 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 5.6875 (5.1111)
+03/20 13:36:56 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.6559 (1.0242)
+03/20 13:36:56 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0075 (0.0083)
+03/20 13:36:56 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 13:36:56 INFO train_distill_dimo.py:1092]     Train tok_entropy: 8.6187 (8.4780)
+03/20 13:36:56 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.8000)
+03/20 13:44:04 INFO train_distill_dimo.py:1081] Iteration 1180, lr_s=9.89e-06 lr_a=9.89e-06, time=45.94s
+03/20 13:44:04 INFO train_distill_dimo.py:1092]     Train H_mean: 0.0000 (0.0000)
+03/20 13:44:04 INFO train_distill_dimo.py:1092]     Train baseline_ema: 0.0000 (0.0000)
+03/20 13:44:04 INFO train_distill_dimo.py:1092]     Train loss_aux_cond: 7.7344 (6.7500)
+03/20 13:44:04 INFO train_distill_dimo.py:1092]     Train loss_kd_cond: 0.3032 (1.7264)
+03/20 13:44:04 INFO train_distill_dimo.py:1092]     Train loss_pg: 0.0074 (0.0386)
+03/20 13:44:04 INFO train_distill_dimo.py:1092]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 13:44:04 INFO train_distill_dimo.py:1092]     Train tok_entropy: 9.7459 (9.6081)
+03/20 13:44:04 INFO train_distill_dimo.py:1092]     Train use_guided_ratio: 1.0000 (0.6000)
+
+
+问题二：最后的 Loss 指标降到多少比较理想？由于我们彻底重构了算法，从“强化学习”换成了“纯数学近似”，现在的各个 Loss 指标有非常明确的物理意义和理想阈值。以下是你接下来的训练中，应该期待看到的**“理想心电图”**（锚点基于词表大小 $K=64000$）：
+1. loss_aux_cond (Aux 拟合误差)起点： $10.0 \sim 11.0$ (相当于瞎猜)理想终点： $2.0 \sim 4.0$ 之间分析： 这个值永远不可能降到 0，因为 Student 的输出带有很强的随机性。只要它稳稳降到 4.0 以下，就说明 Aux 已经是一个非常称职的“替身”，能够完美捕捉 Student 的概率分布轮廓了。
+2. loss_kd_cond (知识蒸馏 KL 散度)走势预期： 极低 $\rightarrow$ 升高 $\rightarrow$ 缓慢下降理想终点： $0.1 \sim 0.5$ 之间分析： 这是最重要的指标。前期 CFG 引导比例低时，它会很接近 0；随着 Teacher 开始频繁使用 CFG=7 给出极其锐利的“完美答案”，这个 Loss 会飙升到 1.0 甚至 2.0 以上。经过漫长的对抗，当它稳定回落并悬停在 $0.1 \sim 0.5$ 区间时，说明你的 Student 已经基本上学会了 Teacher 在 CFG=7 下的精髓。
+3. loss_pg (Bridge 伪梯度注入)理想区间： $0.01 \sim 0.1$ 左右分析： 这是我们用 MSE Trick 强行注入的梯度规模。只要它不是 0.0000（说明没传过去），也不是 > 1.0（说明梯度爆炸，可能带崩模型），稳在零点几的区间内，就说明纠偏信号正在源源不断、平滑地汇入 Student 体内。
+4. tok_entropy (生成多样性)警戒线： 绝对不能低于 $6.0$理想状态： $7.5 \sim 9.5$ 之间分析： 这代表了模型画面的多样性。如果有一天你发现这个值掉到了 $3.0$ 甚至 $1.0$，说明模型发生了严重的**“模式坍塌”（Mode Collapse）**——它找到了一个能拿低 Loss 的捷径，开始无论什么 prompt 都画同样的几张图。如果发生这种情况，必须立刻停止训练。
\ No newline at end of file
diff --git a/URSA/experiments/distill_dimo_v3/config.yaml b/URSA/experiments/distill_dimo_v3/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..fecc5b785d0b31152b14fa59125ce528e2cfe8b1
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v3/config.yaml
@@ -0,0 +1,68 @@
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v3
+  log_every: 10
+  save_every: 100
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 1.0
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 1000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 2
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
diff --git a/URSA/experiments/distill_dimo_v3/logs/20260320_135549.log b/URSA/experiments/distill_dimo_v3/logs/20260320_135549.log
new file mode 100644
index 0000000000000000000000000000000000000000..87da5fbd8dc1ba10847bce97deecca19f458af90
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v3/logs/20260320_135549.log
@@ -0,0 +1,76 @@
+03/20 13:55:49 INFO train_distill_dimo.py:1267] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v3
+  log_every: 10
+  save_every: 100
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/20 13:55:49 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/20 13:57:12 INFO train_distill_dimo.py:183] [init] latents_shape=(13,40,64)  N=33280  K=64000  CFG=ON
+03/20 13:57:41 INFO train_distill_dimo.py:309] [init] verified_native_regime=True  geometry=(49×320×512)  teacher_cfg_scale=7.0
+03/20 13:57:41 INFO train_distill_dimo.py:327] [init] student params: 0.00M
+03/20 13:57:41 INFO train_distill_dimo.py:330] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/20 13:57:41 INFO train_distill_dimo.py:1014] [train] Starting from step 0 / 10000
diff --git a/URSA/experiments/distill_dimo_v3/logs/20260320_140145.log b/URSA/experiments/distill_dimo_v3/logs/20260320_140145.log
new file mode 100644
index 0000000000000000000000000000000000000000..bbd91a289cf9c0ae4cb1d50546552b8e647e52de
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v3/logs/20260320_140145.log
@@ -0,0 +1,71 @@
+03/20 14:01:45 INFO train_distill_dimo.py:1286] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v3
+  log_every: 10
+  save_every: 100
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/20 14:01:45 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
diff --git a/URSA/experiments/distill_dimo_v3/logs/20260320_140318.log b/URSA/experiments/distill_dimo_v3/logs/20260320_140318.log
new file mode 100644
index 0000000000000000000000000000000000000000..748acffbe31faddacc1e9ff5811c62f91e10be6f
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v3/logs/20260320_140318.log
@@ -0,0 +1,76 @@
+03/20 14:03:18 INFO train_distill_dimo.py:1286] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v3
+  log_every: 10
+  save_every: 100
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 50
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/20 14:03:18 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/20 14:04:25 INFO train_distill_dimo.py:183] [init] latents_shape=(13,40,64)  N=33280  K=64000  CFG=ON
+03/20 14:04:47 INFO train_distill_dimo.py:309] [init] verified_native_regime=True  geometry=(49×320×512)  teacher_cfg_scale=7.0
+03/20 14:04:47 INFO train_distill_dimo.py:327] [init] student params: 0.00M
+03/20 14:04:47 INFO train_distill_dimo.py:330] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/20 14:04:47 INFO train_distill_dimo.py:1033] [train] Starting from step 0 / 10000
diff --git a/URSA/experiments/distill_dimo_v3/logs/20260320_140911.log b/URSA/experiments/distill_dimo_v3/logs/20260320_140911.log
new file mode 100644
index 0000000000000000000000000000000000000000..2e6da8b99742b7490c36dec4f086e2abcf019052
--- /dev/null
+++ b/URSA/experiments/distill_dimo_v3/logs/20260320_140911.log
@@ -0,0 +1,1887 @@
+03/20 14:09:11 INFO train_distill_dimo.py:1286] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/distill_dimo_v3
+  log_every: 10
+  save_every: 100
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 10000
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 49
+  height: 320
+  width: 512
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 1.0
+  lambda_pg: 1.0
+  lambda_ent: 0.0
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 7.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 1000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  fake_rounds: 2
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/20 14:09:11 INFO train_distill_dimo.py:160] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/20 14:10:32 INFO train_distill_dimo.py:183] [init] latents_shape=(13,40,64)  N=33280  K=64000  CFG=ON
+03/20 14:11:04 INFO train_distill_dimo.py:309] [init] verified_native_regime=True  geometry=(49×320×512)  teacher_cfg_scale=7.0
+03/20 14:11:04 INFO train_distill_dimo.py:327] [init] student params: 0.00M
+03/20 14:11:04 INFO train_distill_dimo.py:330] [init] max_train_steps=10000  batch_size_per_gpu=1  num_processes=8
+03/20 14:11:04 INFO train_distill_dimo.py:1033] [train] Starting from step 0 / 10000
+03/20 14:21:02 INFO train_distill_dimo.py:1149] Iteration 10, lr_s=2.10e-07 lr_a=2.10e-07, time=56.79s
+03/20 14:21:02 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 14:21:02 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 14:21:02 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 10.0938 (9.1000)
+03/20 14:21:02 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0025 (0.0027)
+03/20 14:21:02 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0000 (0.0000)
+03/20 14:21:02 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 14:21:02 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.8741 (9.8447)
+03/20 14:21:02 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 14:30:09 INFO train_distill_dimo.py:1149] Iteration 20, lr_s=4.10e-07 lr_a=4.10e-07, time=53.59s
+03/20 14:30:09 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 14:30:09 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 14:30:09 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.3281 (7.0383)
+03/20 14:30:09 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0034 (0.0035)
+03/20 14:30:09 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0001 (0.0078)
+03/20 14:30:09 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 14:30:09 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.8746 (9.8733)
+03/20 14:30:09 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 14:39:52 INFO train_distill_dimo.py:1149] Iteration 30, lr_s=6.09e-07 lr_a=6.09e-07, time=54.35s
+03/20 14:39:52 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 14:39:52 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 14:39:52 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.2500 (6.0359)
+03/20 14:39:52 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0052 (0.0050)
+03/20 14:39:52 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0080 (0.0352)
+03/20 14:39:52 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 14:39:52 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.8221 (9.8064)
+03/20 14:39:52 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 14:49:04 INFO train_distill_dimo.py:1149] Iteration 40, lr_s=8.09e-07 lr_a=8.09e-07, time=52.17s
+03/20 14:49:04 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 14:49:04 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 14:49:04 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.1641 (5.5541)
+03/20 14:49:04 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0064 (0.0070)
+03/20 14:49:04 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0108 (0.0933)
+03/20 14:49:04 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 14:49:04 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.8905 (9.7489)
+03/20 14:49:04 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 14:58:23 INFO train_distill_dimo.py:1149] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=55.00s
+03/20 14:58:23 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 14:58:23 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 14:58:23 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 9.3750 (8.0828)
+03/20 14:58:23 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0091 (0.0145)
+03/20 14:58:23 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0001 (0.0060)
+03/20 14:58:23 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 14:58:23 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.8598 (9.8387)
+03/20 14:58:23 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 15:07:19 INFO train_distill_dimo.py:1149] Iteration 60, lr_s=1.21e-06 lr_a=1.21e-06, time=58.23s
+03/20 15:07:19 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 15:07:19 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 15:07:19 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.8438 (6.4557)
+03/20 15:07:19 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0084 (0.0951)
+03/20 15:07:19 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0006 (0.0963)
+03/20 15:07:19 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 15:07:19 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.6987 (9.6934)
+03/20 15:07:19 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 15:15:58 INFO train_distill_dimo.py:1149] Iteration 70, lr_s=1.41e-06 lr_a=1.41e-06, time=53.70s
+03/20 15:15:58 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 15:15:58 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 15:15:58 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.5781 (5.1740)
+03/20 15:15:58 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0083 (0.0095)
+03/20 15:15:58 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0039 (0.0744)
+03/20 15:15:58 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 15:15:58 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.7133 (9.6300)
+03/20 15:15:58 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 15:24:31 INFO train_distill_dimo.py:1149] Iteration 80, lr_s=1.61e-06 lr_a=1.61e-06, time=52.72s
+03/20 15:24:31 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 15:24:31 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 15:24:31 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 0.2812 (2.3121)
+03/20 15:24:31 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0580 (0.1475)
+03/20 15:24:31 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0379 (0.0628)
+03/20 15:24:31 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 15:24:31 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.8852 (7.9155)
+03/20 15:24:31 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 15:33:16 INFO train_distill_dimo.py:1149] Iteration 90, lr_s=1.81e-06 lr_a=1.81e-06, time=53.47s
+03/20 15:33:16 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 15:33:16 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 15:33:16 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.5156 (3.0553)
+03/20 15:33:16 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0306 (0.0593)
+03/20 15:33:16 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0136 (0.0259)
+03/20 15:33:16 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 15:33:16 INFO train_distill_dimo.py:1160]     Train tok_entropy: 6.8609 (6.4548)
+03/20 15:33:16 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 15:41:48 INFO train_distill_dimo.py:1149] Iteration 100, lr_s=2.01e-06 lr_a=2.01e-06, time=60.04s
+03/20 15:41:48 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 15:41:48 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 15:41:48 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 1.3320 (2.4365)
+03/20 15:41:48 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0259 (0.0463)
+03/20 15:41:48 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0689 (0.0779)
+03/20 15:41:48 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 15:41:48 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.4016 (7.3736)
+03/20 15:41:48 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 15:41:48 INFO train_distill_dimo.py:1047] < PROGRESS: 1.01% | SPEED: 54.440s / step | ETA: 6 days, 5:42:37 >
+03/20 15:42:10 INFO train_distill_dimo.py:1135] [save] step=100 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-100 (ZeRO-3 Gathered)
+03/20 15:51:02 INFO train_distill_dimo.py:1149] Iteration 110, lr_s=2.21e-06 lr_a=2.21e-06, time=55.87s
+03/20 15:51:02 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 15:51:02 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 15:51:02 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.9297 (4.0442)
+03/20 15:51:02 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0334 (0.0669)
+03/20 15:51:02 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0054 (0.0267)
+03/20 15:51:02 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 15:51:02 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.6797 (7.7058)
+03/20 15:51:02 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 15:59:35 INFO train_distill_dimo.py:1149] Iteration 120, lr_s=2.41e-06 lr_a=2.41e-06, time=53.42s
+03/20 15:59:35 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 15:59:35 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 15:59:35 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.0820 (3.4813)
+03/20 15:59:35 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0268 (0.0428)
+03/20 15:59:35 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0092 (0.0551)
+03/20 15:59:35 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 15:59:35 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.4215 (8.0122)
+03/20 15:59:35 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 16:08:22 INFO train_distill_dimo.py:1149] Iteration 130, lr_s=2.61e-06 lr_a=2.61e-06, time=50.81s
+03/20 16:08:22 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 16:08:22 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 16:08:22 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.5938 (3.8418)
+03/20 16:08:22 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0495 (0.2024)
+03/20 16:08:22 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0079 (0.0255)
+03/20 16:08:22 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 16:08:22 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.3783 (7.5835)
+03/20 16:08:22 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 16:16:59 INFO train_distill_dimo.py:1149] Iteration 140, lr_s=2.81e-06 lr_a=2.81e-06, time=56.20s
+03/20 16:16:59 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 16:16:59 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 16:16:59 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.6094 (5.4820)
+03/20 16:16:59 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0774 (0.1431)
+03/20 16:16:59 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0018 (0.0114)
+03/20 16:16:59 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 16:16:59 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.3325 (7.3386)
+03/20 16:16:59 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 16:25:28 INFO train_distill_dimo.py:1149] Iteration 150, lr_s=3.01e-06 lr_a=3.01e-06, time=48.39s
+03/20 16:25:28 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 16:25:28 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 16:25:28 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.5312 (4.4267)
+03/20 16:25:28 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0648 (0.1981)
+03/20 16:25:28 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0061 (0.0243)
+03/20 16:25:28 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 16:25:28 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.5403 (7.6610)
+03/20 16:25:28 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 16:33:44 INFO train_distill_dimo.py:1149] Iteration 160, lr_s=3.21e-06 lr_a=3.21e-06, time=48.98s
+03/20 16:33:44 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 16:33:44 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 16:33:44 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.5938 (6.1688)
+03/20 16:33:44 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.1972 (0.7289)
+03/20 16:33:44 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0019 (0.0069)
+03/20 16:33:44 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 16:33:44 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.7923 (7.7677)
+03/20 16:33:44 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 16:41:57 INFO train_distill_dimo.py:1149] Iteration 170, lr_s=3.41e-06 lr_a=3.41e-06, time=48.00s
+03/20 16:41:57 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 16:41:57 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 16:41:57 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.5547 (3.8745)
+03/20 16:41:57 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.1086 (0.2997)
+03/20 16:41:57 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0104 (0.0393)
+03/20 16:41:57 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 16:41:57 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.6966 (7.9035)
+03/20 16:41:57 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 16:50:22 INFO train_distill_dimo.py:1149] Iteration 180, lr_s=3.61e-06 lr_a=3.61e-06, time=52.55s
+03/20 16:50:22 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 16:50:22 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 16:50:22 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.2188 (5.5875)
+03/20 16:50:22 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3295 (0.3575)
+03/20 16:50:22 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0057 (0.0088)
+03/20 16:50:22 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 16:50:22 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0253 (8.6161)
+03/20 16:50:22 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 16:59:16 INFO train_distill_dimo.py:1149] Iteration 190, lr_s=3.81e-06 lr_a=3.81e-06, time=50.81s
+03/20 16:59:16 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 16:59:16 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 16:59:16 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.6094 (6.3773)
+03/20 16:59:16 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0566 (0.1727)
+03/20 16:59:16 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0004 (0.0098)
+03/20 16:59:16 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 16:59:16 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5167 (8.5073)
+03/20 16:59:16 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 17:08:31 INFO train_distill_dimo.py:1149] Iteration 200, lr_s=4.01e-06 lr_a=4.01e-06, time=58.04s
+03/20 17:08:31 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 17:08:31 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 17:08:31 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.8594 (4.7222)
+03/20 17:08:31 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.1168 (0.3413)
+03/20 17:08:31 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0033 (0.0751)
+03/20 17:08:31 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 17:08:31 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0703 (8.8779)
+03/20 17:08:31 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 17:08:31 INFO train_distill_dimo.py:1047] < PROGRESS: 2.01% | SPEED: 53.120s / step | ETA: 6 days, 0:36:20 >
+03/20 17:08:42 INFO train_distill_dimo.py:1135] [save] step=200 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-200 (ZeRO-3 Gathered)
+03/20 17:17:19 INFO train_distill_dimo.py:1149] Iteration 210, lr_s=4.21e-06 lr_a=4.21e-06, time=53.87s
+03/20 17:17:19 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 17:17:19 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 17:17:19 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.5781 (3.5241)
+03/20 17:17:19 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.1776 (0.3009)
+03/20 17:17:19 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0171 (0.0301)
+03/20 17:17:19 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 17:17:19 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5399 (8.5858)
+03/20 17:17:19 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 17:26:15 INFO train_distill_dimo.py:1149] Iteration 220, lr_s=4.41e-06 lr_a=4.41e-06, time=55.17s
+03/20 17:26:15 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 17:26:15 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 17:26:15 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.0625 (5.1066)
+03/20 17:26:15 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2072 (0.4004)
+03/20 17:26:15 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0036 (0.0163)
+03/20 17:26:15 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 17:26:15 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5819 (8.4518)
+03/20 17:26:15 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 17:34:49 INFO train_distill_dimo.py:1149] Iteration 230, lr_s=4.61e-06 lr_a=4.61e-06, time=50.46s
+03/20 17:34:49 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 17:34:49 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 17:34:49 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.0781 (5.3236)
+03/20 17:34:49 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5039 (1.2878)
+03/20 17:34:49 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0078 (0.0435)
+03/20 17:34:49 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 17:34:49 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7627 (8.6648)
+03/20 17:34:49 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 17:43:31 INFO train_distill_dimo.py:1149] Iteration 240, lr_s=4.81e-06 lr_a=4.81e-06, time=52.35s
+03/20 17:43:31 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 17:43:31 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 17:43:31 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.4062 (6.6027)
+03/20 17:43:31 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3687 (0.5367)
+03/20 17:43:31 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0012 (0.0081)
+03/20 17:43:31 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 17:43:31 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.2710 (7.5748)
+03/20 17:43:31 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 17:52:03 INFO train_distill_dimo.py:1149] Iteration 250, lr_s=5.00e-06 lr_a=5.00e-06, time=50.84s
+03/20 17:52:03 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 17:52:03 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 17:52:03 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.2500 (6.1031)
+03/20 17:52:03 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.7084 (2.0959)
+03/20 17:52:03 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0063 (0.0585)
+03/20 17:52:03 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 17:52:03 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0730 (8.7502)
+03/20 17:52:03 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 18:00:37 INFO train_distill_dimo.py:1149] Iteration 260, lr_s=5.20e-06 lr_a=5.20e-06, time=50.71s
+03/20 18:00:37 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 18:00:37 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 18:00:37 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.0938 (7.3203)
+03/20 18:00:37 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3790 (1.0781)
+03/20 18:00:37 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0057 (0.0434)
+03/20 18:00:37 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 18:00:37 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.6590 (8.1079)
+03/20 18:00:37 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.1000)
+03/20 18:09:40 INFO train_distill_dimo.py:1149] Iteration 270, lr_s=5.40e-06 lr_a=5.40e-06, time=52.08s
+03/20 18:09:40 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 18:09:40 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 18:09:40 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 2.7109 (2.8321)
+03/20 18:09:40 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2477 (0.2987)
+03/20 18:09:40 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0163 (0.0271)
+03/20 18:09:40 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 18:09:40 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.7537 (8.1105)
+03/20 18:09:40 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 18:18:09 INFO train_distill_dimo.py:1149] Iteration 280, lr_s=5.60e-06 lr_a=5.60e-06, time=50.25s
+03/20 18:18:09 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 18:18:09 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 18:18:09 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.7188 (3.9969)
+03/20 18:18:09 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4471 (1.6851)
+03/20 18:18:09 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0378 (0.0876)
+03/20 18:18:09 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 18:18:09 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.2325 (7.9807)
+03/20 18:18:09 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 18:26:40 INFO train_distill_dimo.py:1149] Iteration 290, lr_s=5.80e-06 lr_a=5.80e-06, time=50.91s
+03/20 18:26:40 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 18:26:40 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 18:26:40 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.8281 (4.7258)
+03/20 18:26:40 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5236 (1.8957)
+03/20 18:26:40 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0142 (0.0768)
+03/20 18:26:40 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 18:26:40 INFO train_distill_dimo.py:1160]     Train tok_entropy: 6.8661 (7.1907)
+03/20 18:26:40 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 18:35:16 INFO train_distill_dimo.py:1149] Iteration 300, lr_s=6.00e-06 lr_a=6.00e-06, time=49.55s
+03/20 18:35:16 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 18:35:16 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 18:35:16 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.7969 (5.0883)
+03/20 18:35:16 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.1561 (0.2063)
+03/20 18:35:16 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0017 (0.0752)
+03/20 18:35:16 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 18:35:16 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.6758 (8.4828)
+03/20 18:35:16 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 18:35:16 INFO train_distill_dimo.py:1047] < PROGRESS: 3.01% | SPEED: 52.726s / step | ETA: 5 days, 22:04:01 >
+03/20 18:35:27 INFO train_distill_dimo.py:1135] [save] step=300 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-300 (ZeRO-3 Gathered)
+03/20 18:44:04 INFO train_distill_dimo.py:1149] Iteration 310, lr_s=6.20e-06 lr_a=6.20e-06, time=56.72s
+03/20 18:44:04 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 18:44:04 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 18:44:04 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.1406 (6.4337)
+03/20 18:44:04 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2757 (0.3852)
+03/20 18:44:04 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0009 (0.0483)
+03/20 18:44:04 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 18:44:04 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.1449 (7.9509)
+03/20 18:44:04 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.0000)
+03/20 18:52:51 INFO train_distill_dimo.py:1149] Iteration 320, lr_s=6.40e-06 lr_a=6.40e-06, time=54.19s
+03/20 18:52:51 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 18:52:51 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 18:52:51 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.8594 (7.4031)
+03/20 18:52:51 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5726 (2.1360)
+03/20 18:52:51 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0028 (0.0267)
+03/20 18:52:51 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 18:52:51 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.5220 (7.2196)
+03/20 18:52:51 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 19:01:32 INFO train_distill_dimo.py:1149] Iteration 330, lr_s=6.60e-06 lr_a=6.60e-06, time=51.51s
+03/20 19:01:32 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 19:01:32 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 19:01:32 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.3750 (5.7266)
+03/20 19:01:32 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.2869 (2.0953)
+03/20 19:01:32 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0166 (0.0364)
+03/20 19:01:32 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 19:01:32 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.5136 (7.5490)
+03/20 19:01:32 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 19:10:23 INFO train_distill_dimo.py:1149] Iteration 340, lr_s=6.80e-06 lr_a=6.80e-06, time=50.86s
+03/20 19:10:23 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 19:10:23 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 19:10:23 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.5312 (7.0609)
+03/20 19:10:23 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.8023 (1.8554)
+03/20 19:10:23 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0042 (0.0247)
+03/20 19:10:23 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 19:10:23 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1942 (8.5427)
+03/20 19:10:23 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 19:18:52 INFO train_distill_dimo.py:1149] Iteration 350, lr_s=7.00e-06 lr_a=7.00e-06, time=50.02s
+03/20 19:18:52 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 19:18:52 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 19:18:52 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.1406 (5.5865)
+03/20 19:18:52 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4453 (0.8680)
+03/20 19:18:52 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0043 (0.0164)
+03/20 19:18:52 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 19:18:52 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.3036 (7.5178)
+03/20 19:18:52 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 19:27:38 INFO train_distill_dimo.py:1149] Iteration 360, lr_s=7.20e-06 lr_a=7.20e-06, time=49.97s
+03/20 19:27:38 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 19:27:38 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 19:27:38 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.5469 (5.7463)
+03/20 19:27:38 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4239 (0.9361)
+03/20 19:27:38 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0026 (0.0405)
+03/20 19:27:38 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 19:27:38 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.4235 (9.2831)
+03/20 19:27:38 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 19:36:18 INFO train_distill_dimo.py:1149] Iteration 370, lr_s=7.40e-06 lr_a=7.40e-06, time=50.16s
+03/20 19:36:18 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 19:36:18 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 19:36:18 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.3750 (6.6340)
+03/20 19:36:18 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.1630 (0.3825)
+03/20 19:36:18 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0004 (0.0103)
+03/20 19:36:18 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 19:36:18 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.0353 (8.1436)
+03/20 19:36:18 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.2000)
+03/20 19:45:12 INFO train_distill_dimo.py:1149] Iteration 380, lr_s=7.60e-06 lr_a=7.60e-06, time=49.20s
+03/20 19:45:12 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 19:45:12 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 19:45:12 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.7734 (4.1093)
+03/20 19:45:12 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.9971 (1.1629)
+03/20 19:45:12 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0149 (0.0888)
+03/20 19:45:12 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 19:45:12 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.5787 (7.6705)
+03/20 19:45:12 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 19:53:35 INFO train_distill_dimo.py:1149] Iteration 390, lr_s=7.80e-06 lr_a=7.80e-06, time=49.97s
+03/20 19:53:35 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 19:53:35 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 19:53:35 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 1.6875 (3.9728)
+03/20 19:53:35 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.1742 (2.2606)
+03/20 19:53:35 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0766 (0.1492)
+03/20 19:53:35 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 19:53:35 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4794 (8.2919)
+03/20 19:53:35 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 20:02:16 INFO train_distill_dimo.py:1149] Iteration 400, lr_s=8.00e-06 lr_a=8.00e-06, time=52.93s
+03/20 20:02:16 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 20:02:16 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 20:02:16 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.6875 (7.1594)
+03/20 20:02:16 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4675 (0.5972)
+03/20 20:02:16 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0012 (0.0077)
+03/20 20:02:16 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 20:02:16 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7416 (8.6918)
+03/20 20:02:16 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 20:02:16 INFO train_distill_dimo.py:1047] < PROGRESS: 4.01% | SPEED: 52.567s / step | ETA: 5 days, 20:10:46 >
+03/20 20:02:32 INFO train_distill_dimo.py:1135] [save] step=400 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-400 (ZeRO-3 Gathered)
+03/20 20:11:39 INFO train_distill_dimo.py:1149] Iteration 410, lr_s=8.20e-06 lr_a=8.20e-06, time=55.36s
+03/20 20:11:39 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 20:11:39 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 20:11:39 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.6562 (5.3926)
+03/20 20:11:39 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3614 (1.3399)
+03/20 20:11:39 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0029 (0.0389)
+03/20 20:11:39 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 20:11:39 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5699 (8.2191)
+03/20 20:11:39 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 20:20:26 INFO train_distill_dimo.py:1149] Iteration 420, lr_s=8.40e-06 lr_a=8.40e-06, time=53.39s
+03/20 20:20:26 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 20:20:26 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 20:20:26 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.8438 (4.9875)
+03/20 20:20:26 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.8894 (0.8077)
+03/20 20:20:26 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0200 (0.0464)
+03/20 20:20:26 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 20:20:26 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.3379 (8.4227)
+03/20 20:20:26 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 20:28:58 INFO train_distill_dimo.py:1149] Iteration 430, lr_s=8.60e-06 lr_a=8.60e-06, time=52.37s
+03/20 20:28:58 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 20:28:58 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 20:28:58 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 8.2812 (7.7391)
+03/20 20:28:58 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.5692 (2.0525)
+03/20 20:28:58 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0017 (0.0263)
+03/20 20:28:58 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 20:28:58 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.6637 (8.6532)
+03/20 20:28:58 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 20:37:26 INFO train_distill_dimo.py:1149] Iteration 440, lr_s=8.80e-06 lr_a=8.80e-06, time=53.01s
+03/20 20:37:26 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 20:37:26 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 20:37:26 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.0547 (5.0762)
+03/20 20:37:26 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 2.0761 (3.0130)
+03/20 20:37:26 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0603 (0.1178)
+03/20 20:37:26 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 20:37:26 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.9877 (7.9472)
+03/20 20:37:26 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 20:46:06 INFO train_distill_dimo.py:1149] Iteration 450, lr_s=9.00e-06 lr_a=9.00e-06, time=51.34s
+03/20 20:46:06 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 20:46:06 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 20:46:06 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.4453 (4.3303)
+03/20 20:46:06 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.7000 (0.6591)
+03/20 20:46:06 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0145 (0.0801)
+03/20 20:46:06 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 20:46:06 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4553 (8.3390)
+03/20 20:46:06 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.7000)
+03/20 20:54:39 INFO train_distill_dimo.py:1149] Iteration 460, lr_s=9.20e-06 lr_a=9.20e-06, time=51.77s
+03/20 20:54:39 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 20:54:39 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 20:54:39 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.3164 (4.6702)
+03/20 20:54:39 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5561 (1.6179)
+03/20 20:54:39 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0353 (0.0875)
+03/20 20:54:39 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 20:54:39 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9226 (8.8103)
+03/20 20:54:39 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 21:03:30 INFO train_distill_dimo.py:1149] Iteration 470, lr_s=9.40e-06 lr_a=9.40e-06, time=51.89s
+03/20 21:03:30 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 21:03:30 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 21:03:30 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.0000 (4.7554)
+03/20 21:03:30 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3759 (0.4341)
+03/20 21:03:30 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0108 (0.0643)
+03/20 21:03:30 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 21:03:30 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4590 (8.5023)
+03/20 21:03:30 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.7000)
+03/20 21:12:12 INFO train_distill_dimo.py:1149] Iteration 480, lr_s=9.60e-06 lr_a=9.60e-06, time=50.54s
+03/20 21:12:12 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 21:12:12 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 21:12:12 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.2812 (6.9000)
+03/20 21:12:12 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2920 (0.5822)
+03/20 21:12:12 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0008 (0.0072)
+03/20 21:12:12 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 21:12:12 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4133 (8.5219)
+03/20 21:12:12 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 21:21:05 INFO train_distill_dimo.py:1149] Iteration 490, lr_s=9.80e-06 lr_a=9.80e-06, time=56.57s
+03/20 21:21:05 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 21:21:05 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 21:21:05 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.4219 (4.6670)
+03/20 21:21:05 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3169 (0.3110)
+03/20 21:21:05 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0089 (0.0368)
+03/20 21:21:05 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 21:21:05 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.6194 (8.6053)
+03/20 21:21:05 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 21:29:54 INFO train_distill_dimo.py:1149] Iteration 500, lr_s=1.00e-05 lr_a=1.00e-05, time=53.06s
+03/20 21:29:54 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 21:29:54 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 21:29:54 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.1406 (4.8288)
+03/20 21:29:54 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3498 (0.7741)
+03/20 21:29:54 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0086 (0.1107)
+03/20 21:29:54 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 21:29:54 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3418 (9.2146)
+03/20 21:29:54 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 21:29:54 INFO train_distill_dimo.py:1047] < PROGRESS: 5.01% | SPEED: 52.536s / step | ETA: 5 days, 18:38:15 >
+03/20 21:30:09 INFO train_distill_dimo.py:1135] [save] step=500 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-500 (ZeRO-3 Gathered)
+03/20 21:38:38 INFO train_distill_dimo.py:1149] Iteration 510, lr_s=1.00e-05 lr_a=1.00e-05, time=49.91s
+03/20 21:38:38 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 21:38:38 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 21:38:38 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.2656 (5.1479)
+03/20 21:38:38 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3186 (0.8893)
+03/20 21:38:38 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0037 (0.0159)
+03/20 21:38:38 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 21:38:38 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.3062 (8.4235)
+03/20 21:38:38 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 21:47:19 INFO train_distill_dimo.py:1149] Iteration 520, lr_s=1.00e-05 lr_a=1.00e-05, time=49.62s
+03/20 21:47:19 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 21:47:19 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 21:47:19 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.5781 (6.0107)
+03/20 21:47:19 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2187 (0.2968)
+03/20 21:47:19 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0015 (0.0491)
+03/20 21:47:19 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 21:47:19 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.4669 (9.3791)
+03/20 21:47:19 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 21:56:13 INFO train_distill_dimo.py:1149] Iteration 530, lr_s=1.00e-05 lr_a=1.00e-05, time=53.78s
+03/20 21:56:13 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 21:56:13 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 21:56:13 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.9062 (4.1721)
+03/20 21:56:13 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3677 (0.3503)
+03/20 21:56:13 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0057 (0.0486)
+03/20 21:56:13 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 21:56:13 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5172 (8.6402)
+03/20 21:56:13 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.8000)
+03/20 22:04:57 INFO train_distill_dimo.py:1149] Iteration 540, lr_s=1.00e-05 lr_a=1.00e-05, time=53.79s
+03/20 22:04:57 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 22:04:57 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 22:04:57 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.3047 (2.9571)
+03/20 22:04:57 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2523 (0.2626)
+03/20 22:04:57 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0175 (0.1151)
+03/20 22:04:57 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 22:04:57 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9469 (8.9179)
+03/20 22:04:57 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 22:13:29 INFO train_distill_dimo.py:1149] Iteration 550, lr_s=1.00e-05 lr_a=1.00e-05, time=51.74s
+03/20 22:13:29 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 22:13:29 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 22:13:29 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.4062 (4.2518)
+03/20 22:13:29 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3284 (0.3526)
+03/20 22:13:29 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0289 (0.0823)
+03/20 22:13:29 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 22:13:29 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.2097 (8.1374)
+03/20 22:13:29 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 22:22:03 INFO train_distill_dimo.py:1149] Iteration 560, lr_s=1.00e-05 lr_a=1.00e-05, time=51.62s
+03/20 22:22:03 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 22:22:03 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 22:22:03 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.2578 (3.8837)
+03/20 22:22:03 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.1026 (0.3620)
+03/20 22:22:03 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0287 (0.0765)
+03/20 22:22:03 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 22:22:03 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4775 (8.4993)
+03/20 22:22:03 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 22:30:48 INFO train_distill_dimo.py:1149] Iteration 570, lr_s=1.00e-05 lr_a=1.00e-05, time=52.66s
+03/20 22:30:48 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 22:30:48 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 22:30:48 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.0156 (5.8256)
+03/20 22:30:48 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3780 (0.7156)
+03/20 22:30:48 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0008 (0.0519)
+03/20 22:30:48 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 22:30:48 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4118 (8.1132)
+03/20 22:30:48 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.3000)
+03/20 22:39:18 INFO train_distill_dimo.py:1149] Iteration 580, lr_s=1.00e-05 lr_a=1.00e-05, time=48.23s
+03/20 22:39:18 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 22:39:18 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 22:39:18 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.3125 (4.9645)
+03/20 22:39:18 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 2.1342 (2.7882)
+03/20 22:39:18 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0045 (0.1614)
+03/20 22:39:18 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 22:39:18 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.2813 (8.8220)
+03/20 22:39:18 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 22:47:50 INFO train_distill_dimo.py:1149] Iteration 590, lr_s=1.00e-05 lr_a=1.00e-05, time=52.75s
+03/20 22:47:50 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 22:47:50 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 22:47:50 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 8.0781 (7.9297)
+03/20 22:47:50 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.8656 (1.4953)
+03/20 22:47:50 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0008 (0.0130)
+03/20 22:47:50 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 22:47:50 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8385 (8.0629)
+03/20 22:47:50 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/20 22:56:03 INFO train_distill_dimo.py:1149] Iteration 600, lr_s=1.00e-05 lr_a=1.00e-05, time=48.87s
+03/20 22:56:03 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 22:56:03 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 22:56:03 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.8750 (6.6719)
+03/20 22:56:03 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4379 (0.7044)
+03/20 22:56:03 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0021 (0.0040)
+03/20 22:56:03 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 22:56:03 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.2141 (7.1847)
+03/20 22:56:03 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 22:56:03 INFO train_distill_dimo.py:1047] < PROGRESS: 6.01% | SPEED: 52.370s / step | ETA: 5 days, 16:44:40 >
+03/20 22:56:17 INFO train_distill_dimo.py:1135] [save] step=600 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-600 (ZeRO-3 Gathered)
+03/20 23:04:31 INFO train_distill_dimo.py:1149] Iteration 610, lr_s=1.00e-05 lr_a=1.00e-05, time=48.69s
+03/20 23:04:31 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 23:04:31 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 23:04:31 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.3125 (4.9547)
+03/20 23:04:31 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.1092 (1.8306)
+03/20 23:04:31 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0099 (0.0421)
+03/20 23:04:31 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 23:04:31 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.5002 (7.3132)
+03/20 23:04:31 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 23:13:11 INFO train_distill_dimo.py:1149] Iteration 620, lr_s=1.00e-05 lr_a=1.00e-05, time=54.57s
+03/20 23:13:11 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 23:13:11 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 23:13:11 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.4766 (4.5876)
+03/20 23:13:11 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4736 (0.8435)
+03/20 23:13:11 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0050 (0.0870)
+03/20 23:13:11 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 23:13:11 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.2300 (8.8709)
+03/20 23:13:11 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/20 23:21:38 INFO train_distill_dimo.py:1149] Iteration 630, lr_s=1.00e-05 lr_a=1.00e-05, time=51.45s
+03/20 23:21:38 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 23:21:38 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 23:21:38 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.0469 (5.2715)
+03/20 23:21:38 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5475 (0.8387)
+03/20 23:21:38 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0056 (0.0998)
+03/20 23:21:38 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 23:21:38 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0111 (8.8768)
+03/20 23:21:38 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 23:30:00 INFO train_distill_dimo.py:1149] Iteration 640, lr_s=1.00e-05 lr_a=1.00e-05, time=50.44s
+03/20 23:30:00 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 23:30:00 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 23:30:00 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.5156 (6.5961)
+03/20 23:30:00 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.7548 (2.3618)
+03/20 23:30:00 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0016 (0.0227)
+03/20 23:30:00 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 23:30:00 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.4782 (7.8862)
+03/20 23:30:00 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.7000)
+03/20 23:38:21 INFO train_distill_dimo.py:1149] Iteration 650, lr_s=9.99e-06 lr_a=9.99e-06, time=48.60s
+03/20 23:38:21 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 23:38:21 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 23:38:21 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.3906 (3.9879)
+03/20 23:38:21 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5081 (0.5712)
+03/20 23:38:21 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0262 (0.0507)
+03/20 23:38:21 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 23:38:21 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.0331 (8.1997)
+03/20 23:38:21 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.6000)
+03/20 23:46:49 INFO train_distill_dimo.py:1149] Iteration 660, lr_s=9.99e-06 lr_a=9.99e-06, time=50.64s
+03/20 23:46:49 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 23:46:49 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 23:46:49 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.7656 (7.0561)
+03/20 23:46:49 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5751 (1.1930)
+03/20 23:46:49 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0009 (0.0149)
+03/20 23:46:49 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 23:46:49 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9221 (8.7731)
+03/20 23:46:49 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.7000)
+03/20 23:55:22 INFO train_distill_dimo.py:1149] Iteration 670, lr_s=9.99e-06 lr_a=9.99e-06, time=49.77s
+03/20 23:55:22 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/20 23:55:22 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/20 23:55:22 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.1875 (5.7402)
+03/20 23:55:22 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5283 (0.8396)
+03/20 23:55:22 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0014 (0.0543)
+03/20 23:55:22 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/20 23:55:22 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9596 (8.8516)
+03/20 23:55:22 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.5000 (0.5000)
+03/21 00:03:50 INFO train_distill_dimo.py:1149] Iteration 680, lr_s=9.99e-06 lr_a=9.99e-06, time=49.72s
+03/21 00:03:50 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 00:03:50 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 00:03:50 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.1250 (4.9961)
+03/21 00:03:50 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5336 (0.4825)
+03/21 00:03:50 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0088 (0.0414)
+03/21 00:03:50 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 00:03:50 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.0695 (8.1205)
+03/21 00:03:50 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.7000)
+03/21 00:12:48 INFO train_distill_dimo.py:1149] Iteration 690, lr_s=9.99e-06 lr_a=9.99e-06, time=55.72s
+03/21 00:12:48 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 00:12:48 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 00:12:48 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.6406 (5.4354)
+03/21 00:12:48 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.4410 (3.4990)
+03/21 00:12:48 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0091 (0.0444)
+03/21 00:12:48 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 00:12:48 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4946 (8.4583)
+03/21 00:12:48 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.8000)
+03/21 00:21:17 INFO train_distill_dimo.py:1149] Iteration 700, lr_s=9.99e-06 lr_a=9.99e-06, time=47.42s
+03/21 00:21:17 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 00:21:17 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 00:21:17 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.2969 (6.1445)
+03/21 00:21:17 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 3.7805 (3.1373)
+03/21 00:21:17 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0130 (0.0363)
+03/21 00:21:17 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 00:21:17 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4358 (8.2429)
+03/21 00:21:17 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.6000)
+03/21 00:21:17 INFO train_distill_dimo.py:1047] < PROGRESS: 7.01% | SPEED: 52.175s / step | ETA: 5 days, 14:47:08 >
+03/21 00:21:28 INFO train_distill_dimo.py:1135] [save] step=700 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-700 (ZeRO-3 Gathered)
+03/21 00:29:55 INFO train_distill_dimo.py:1149] Iteration 710, lr_s=9.99e-06 lr_a=9.99e-06, time=56.10s
+03/21 00:29:55 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 00:29:55 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 00:29:55 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.3906 (4.9639)
+03/21 00:29:55 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 3.2284 (3.1572)
+03/21 00:29:55 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0032 (0.1016)
+03/21 00:29:55 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 00:29:55 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.6124 (8.4211)
+03/21 00:29:55 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.8000)
+03/21 00:38:23 INFO train_distill_dimo.py:1149] Iteration 720, lr_s=9.99e-06 lr_a=9.99e-06, time=49.56s
+03/21 00:38:23 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 00:38:23 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 00:38:23 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.3906 (5.6802)
+03/21 00:38:23 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 3.0664 (3.2094)
+03/21 00:38:23 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0038 (0.0158)
+03/21 00:38:23 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 00:38:23 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.6941 (8.7376)
+03/21 00:38:23 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.6000)
+03/21 00:47:02 INFO train_distill_dimo.py:1149] Iteration 730, lr_s=9.99e-06 lr_a=9.99e-06, time=51.46s
+03/21 00:47:02 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 00:47:02 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 00:47:02 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 8.0469 (6.2560)
+03/21 00:47:02 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 2.8157 (3.4647)
+03/21 00:47:02 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0095 (0.0504)
+03/21 00:47:02 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 00:47:02 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0946 (9.0302)
+03/21 00:47:02 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.6000)
+03/21 00:55:26 INFO train_distill_dimo.py:1149] Iteration 740, lr_s=9.99e-06 lr_a=9.99e-06, time=48.33s
+03/21 00:55:26 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 00:55:26 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 00:55:26 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.2969 (4.8607)
+03/21 00:55:26 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 4.0555 (3.5782)
+03/21 00:55:26 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0107 (0.0637)
+03/21 00:55:26 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 00:55:26 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7647 (8.7329)
+03/21 00:55:26 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.9000)
+03/21 01:03:36 INFO train_distill_dimo.py:1149] Iteration 750, lr_s=9.98e-06 lr_a=9.98e-06, time=49.61s
+03/21 01:03:36 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 01:03:36 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 01:03:36 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.0859 (3.3724)
+03/21 01:03:36 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 3.9940 (4.7660)
+03/21 01:03:36 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.1356 (0.1455)
+03/21 01:03:36 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 01:03:36 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1104 (8.6605)
+03/21 01:03:36 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 0.0000 (0.4000)
+03/21 01:11:53 INFO train_distill_dimo.py:1149] Iteration 760, lr_s=9.98e-06 lr_a=9.98e-06, time=56.22s
+03/21 01:11:53 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 01:11:53 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 01:11:53 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 8.1719 (6.2194)
+03/21 01:11:53 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 2.8020 (3.2959)
+03/21 01:11:53 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0008 (0.0457)
+03/21 01:11:53 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 01:11:53 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9531 (8.9775)
+03/21 01:11:53 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.9000)
+03/21 01:20:16 INFO train_distill_dimo.py:1149] Iteration 770, lr_s=9.98e-06 lr_a=9.98e-06, time=50.72s
+03/21 01:20:16 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 01:20:16 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 01:20:16 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.0234 (3.9792)
+03/21 01:20:16 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3649 (0.4375)
+03/21 01:20:16 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0367 (0.0645)
+03/21 01:20:16 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 01:20:16 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.6789 (8.6369)
+03/21 01:20:16 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.9000)
+03/21 01:28:38 INFO train_distill_dimo.py:1149] Iteration 780, lr_s=9.98e-06 lr_a=9.98e-06, time=52.74s
+03/21 01:28:38 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 01:28:38 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 01:28:38 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.1562 (4.3037)
+03/21 01:28:38 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3108 (0.4467)
+03/21 01:28:38 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0144 (0.0329)
+03/21 01:28:38 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 01:28:38 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.3318 (8.4178)
+03/21 01:28:38 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.7000)
+03/21 01:37:00 INFO train_distill_dimo.py:1149] Iteration 790, lr_s=9.98e-06 lr_a=9.98e-06, time=51.81s
+03/21 01:37:00 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 01:37:00 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 01:37:00 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.0625 (6.3652)
+03/21 01:37:00 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6951 (1.4976)
+03/21 01:37:00 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0015 (0.0623)
+03/21 01:37:00 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 01:37:00 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0548 (8.5432)
+03/21 01:37:00 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.9000)
+03/21 01:45:24 INFO train_distill_dimo.py:1149] Iteration 800, lr_s=9.98e-06 lr_a=9.98e-06, time=50.14s
+03/21 01:45:24 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 01:45:24 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 01:45:24 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.2969 (5.6797)
+03/21 01:45:24 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5254 (0.7319)
+03/21 01:45:24 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0058 (0.0141)
+03/21 01:45:24 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 01:45:24 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1310 (9.1744)
+03/21 01:45:24 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.8000)
+03/21 01:45:24 INFO train_distill_dimo.py:1047] < PROGRESS: 8.01% | SPEED: 51.948s / step | ETA: 5 days, 12:45:21 >
+03/21 01:45:35 INFO train_distill_dimo.py:1135] [save] step=800 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-800 (ZeRO-3 Gathered)
+03/21 01:53:55 INFO train_distill_dimo.py:1149] Iteration 810, lr_s=9.98e-06 lr_a=9.98e-06, time=49.23s
+03/21 01:53:55 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 01:53:55 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 01:53:55 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.8438 (5.1625)
+03/21 01:53:55 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4458 (0.5061)
+03/21 01:53:55 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0081 (0.0160)
+03/21 01:53:55 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 01:53:55 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.8993 (7.9783)
+03/21 01:53:55 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.8000)
+03/21 02:02:20 INFO train_distill_dimo.py:1149] Iteration 820, lr_s=9.97e-06 lr_a=9.97e-06, time=49.27s
+03/21 02:02:20 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 02:02:20 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 02:02:20 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.0625 (4.0737)
+03/21 02:02:20 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5663 (1.5458)
+03/21 02:02:20 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0311 (0.0779)
+03/21 02:02:20 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 02:02:20 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0022 (8.5980)
+03/21 02:02:20 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.7000)
+03/21 02:10:56 INFO train_distill_dimo.py:1149] Iteration 830, lr_s=9.97e-06 lr_a=9.97e-06, time=48.49s
+03/21 02:10:56 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 02:10:56 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 02:10:56 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 2.5000 (3.5575)
+03/21 02:10:56 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.0996 (1.2337)
+03/21 02:10:56 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0364 (0.0658)
+03/21 02:10:56 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 02:10:56 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.9993 (8.1055)
+03/21 02:10:56 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.7000)
+03/21 02:19:11 INFO train_distill_dimo.py:1149] Iteration 840, lr_s=9.97e-06 lr_a=9.97e-06, time=50.24s
+03/21 02:19:11 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 02:19:11 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 02:19:11 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.7188 (6.1085)
+03/21 02:19:11 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6933 (0.8198)
+03/21 02:19:11 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0018 (0.0652)
+03/21 02:19:11 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 02:19:11 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0085 (8.9369)
+03/21 02:19:11 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 02:27:20 INFO train_distill_dimo.py:1149] Iteration 850, lr_s=9.97e-06 lr_a=9.97e-06, time=48.34s
+03/21 02:27:20 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 02:27:20 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 02:27:20 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.5938 (4.9156)
+03/21 02:27:20 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.9891 (2.1167)
+03/21 02:27:20 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0187 (0.0657)
+03/21 02:27:20 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 02:27:20 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.5288 (7.0623)
+03/21 02:27:20 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.7000)
+03/21 02:35:39 INFO train_distill_dimo.py:1149] Iteration 860, lr_s=9.97e-06 lr_a=9.97e-06, time=48.57s
+03/21 02:35:39 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 02:35:39 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 02:35:39 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.1719 (4.6172)
+03/21 02:35:39 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5042 (2.6606)
+03/21 02:35:39 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0405 (0.0701)
+03/21 02:35:39 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 02:35:39 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.1831 (8.0401)
+03/21 02:35:39 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 02:44:02 INFO train_distill_dimo.py:1149] Iteration 870, lr_s=9.97e-06 lr_a=9.97e-06, time=50.23s
+03/21 02:44:02 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 02:44:02 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 02:44:02 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.2969 (7.3281)
+03/21 02:44:02 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.5579 (2.0979)
+03/21 02:44:02 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0036 (0.0138)
+03/21 02:44:02 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 02:44:02 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.7058 (9.2514)
+03/21 02:44:02 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.9000)
+03/21 02:52:19 INFO train_distill_dimo.py:1149] Iteration 880, lr_s=9.96e-06 lr_a=9.96e-06, time=48.85s
+03/21 02:52:19 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 02:52:19 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 02:52:19 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.4688 (6.1328)
+03/21 02:52:19 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6353 (0.9407)
+03/21 02:52:19 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0041 (0.0113)
+03/21 02:52:19 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 02:52:19 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.2991 (8.0127)
+03/21 02:52:19 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.9000)
+03/21 03:00:58 INFO train_distill_dimo.py:1149] Iteration 890, lr_s=9.96e-06 lr_a=9.96e-06, time=50.71s
+03/21 03:00:58 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 03:00:58 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 03:00:58 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 0.8242 (3.2000)
+03/21 03:00:58 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.0176 (1.2019)
+03/21 03:00:58 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.1611 (0.1480)
+03/21 03:00:58 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 03:00:58 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0379 (9.1614)
+03/21 03:00:58 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.7000)
+03/21 03:09:16 INFO train_distill_dimo.py:1149] Iteration 900, lr_s=9.96e-06 lr_a=9.96e-06, time=49.39s
+03/21 03:09:16 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 03:09:16 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 03:09:16 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.1875 (5.2515)
+03/21 03:09:16 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.2909 (2.2523)
+03/21 03:09:16 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0197 (0.0603)
+03/21 03:09:16 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 03:09:16 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.8930 (7.9355)
+03/21 03:09:16 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 03:09:16 INFO train_distill_dimo.py:1047] < PROGRESS: 9.01% | SPEED: 51.755s / step | ETA: 5 days, 10:49:30 >
+03/21 03:09:28 INFO train_distill_dimo.py:1135] [save] step=900 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-900 (ZeRO-3 Gathered)
+03/21 03:17:37 INFO train_distill_dimo.py:1149] Iteration 910, lr_s=9.96e-06 lr_a=9.96e-06, time=48.72s
+03/21 03:17:37 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 03:17:37 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 03:17:37 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.7031 (4.6191)
+03/21 03:17:37 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.9969 (1.0299)
+03/21 03:17:37 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0317 (0.0831)
+03/21 03:17:37 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 03:17:37 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7472 (8.5308)
+03/21 03:17:37 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.9000)
+03/21 03:26:03 INFO train_distill_dimo.py:1149] Iteration 920, lr_s=9.96e-06 lr_a=9.96e-06, time=49.74s
+03/21 03:26:03 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 03:26:03 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 03:26:03 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.6250 (5.4064)
+03/21 03:26:03 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4713 (0.9081)
+03/21 03:26:03 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0042 (0.0172)
+03/21 03:26:03 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 03:26:03 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5830 (8.7392)
+03/21 03:26:03 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 03:34:41 INFO train_distill_dimo.py:1149] Iteration 930, lr_s=9.95e-06 lr_a=9.95e-06, time=53.09s
+03/21 03:34:41 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 03:34:41 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 03:34:41 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.8594 (5.1877)
+03/21 03:34:41 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5502 (0.5302)
+03/21 03:34:41 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0067 (0.0450)
+03/21 03:34:41 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 03:34:41 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8582 (8.7894)
+03/21 03:34:41 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 03:43:18 INFO train_distill_dimo.py:1149] Iteration 940, lr_s=9.95e-06 lr_a=9.95e-06, time=53.51s
+03/21 03:43:18 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 03:43:18 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 03:43:18 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.4531 (5.0248)
+03/21 03:43:18 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.0258 (1.3694)
+03/21 03:43:18 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0681 (0.1183)
+03/21 03:43:18 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 03:43:18 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0582 (8.6132)
+03/21 03:43:18 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 03:52:06 INFO train_distill_dimo.py:1149] Iteration 950, lr_s=9.95e-06 lr_a=9.95e-06, time=48.88s
+03/21 03:52:06 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 03:52:06 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 03:52:06 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.0469 (4.3788)
+03/21 03:52:06 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6584 (0.9547)
+03/21 03:52:06 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0249 (0.0883)
+03/21 03:52:06 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 03:52:06 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0358 (8.7564)
+03/21 03:52:06 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.9000)
+03/21 04:00:29 INFO train_distill_dimo.py:1149] Iteration 960, lr_s=9.95e-06 lr_a=9.95e-06, time=50.94s
+03/21 04:00:29 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 04:00:29 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 04:00:29 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.5938 (5.2988)
+03/21 04:00:29 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6202 (0.9911)
+03/21 04:00:29 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0058 (0.0386)
+03/21 04:00:29 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 04:00:29 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.4381 (9.3586)
+03/21 04:00:29 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (0.8000)
+03/21 04:08:50 INFO train_distill_dimo.py:1149] Iteration 970, lr_s=9.95e-06 lr_a=9.95e-06, time=52.72s
+03/21 04:08:50 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 04:08:50 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 04:08:50 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.8438 (4.9241)
+03/21 04:08:50 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5214 (0.8155)
+03/21 04:08:50 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0070 (0.1255)
+03/21 04:08:50 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 04:08:50 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0723 (9.0317)
+03/21 04:08:50 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 04:17:25 INFO train_distill_dimo.py:1149] Iteration 980, lr_s=9.94e-06 lr_a=9.94e-06, time=50.46s
+03/21 04:17:25 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 04:17:25 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 04:17:25 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.6250 (5.7242)
+03/21 04:17:25 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5582 (0.7142)
+03/21 04:17:25 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0039 (0.0344)
+03/21 04:17:25 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 04:17:25 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.3432 (8.4103)
+03/21 04:17:25 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 04:25:47 INFO train_distill_dimo.py:1149] Iteration 990, lr_s=9.94e-06 lr_a=9.94e-06, time=49.57s
+03/21 04:25:47 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 04:25:47 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 04:25:47 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.7656 (6.6516)
+03/21 04:25:47 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5575 (1.0544)
+03/21 04:25:47 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0065 (0.0130)
+03/21 04:25:47 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 04:25:47 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4408 (8.6227)
+03/21 04:25:47 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 04:34:10 INFO train_distill_dimo.py:1149] Iteration 1000, lr_s=9.94e-06 lr_a=9.94e-06, time=50.53s
+03/21 04:34:10 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 04:34:10 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 04:34:10 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.4766 (3.7218)
+03/21 04:34:10 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4228 (0.6350)
+03/21 04:34:10 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0286 (0.1252)
+03/21 04:34:10 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 04:34:10 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0970 (8.8408)
+03/21 04:34:10 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 04:34:10 INFO train_distill_dimo.py:1047] < PROGRESS: 10.01% | SPEED: 51.661s / step | ETA: 5 days, 9:09:11 >
+03/21 04:34:21 INFO train_distill_dimo.py:1135] [save] step=1000 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-1000 (ZeRO-3 Gathered)
+03/21 04:42:51 INFO train_distill_dimo.py:1149] Iteration 1010, lr_s=9.94e-06 lr_a=9.94e-06, time=56.29s
+03/21 04:42:51 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 04:42:51 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 04:42:51 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.9844 (5.3128)
+03/21 04:42:51 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4607 (0.8884)
+03/21 04:42:51 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0023 (0.0243)
+03/21 04:42:51 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 04:42:51 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0654 (8.8473)
+03/21 04:42:51 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 04:51:20 INFO train_distill_dimo.py:1149] Iteration 1020, lr_s=9.93e-06 lr_a=9.93e-06, time=49.73s
+03/21 04:51:20 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 04:51:20 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 04:51:20 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.4297 (3.7573)
+03/21 04:51:20 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3491 (0.4412)
+03/21 04:51:20 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0333 (0.0932)
+03/21 04:51:20 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 04:51:20 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8279 (8.8823)
+03/21 04:51:20 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 04:59:47 INFO train_distill_dimo.py:1149] Iteration 1030, lr_s=9.93e-06 lr_a=9.93e-06, time=49.54s
+03/21 04:59:47 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 04:59:47 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 04:59:47 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.4688 (6.0086)
+03/21 04:59:47 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4723 (0.6985)
+03/21 04:59:47 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0022 (0.0177)
+03/21 04:59:47 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 04:59:47 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8575 (8.8617)
+03/21 04:59:47 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 05:08:06 INFO train_distill_dimo.py:1149] Iteration 1040, lr_s=9.93e-06 lr_a=9.93e-06, time=52.30s
+03/21 05:08:06 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 05:08:06 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 05:08:06 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.6406 (5.6644)
+03/21 05:08:06 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4242 (0.4055)
+03/21 05:08:06 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0023 (0.0427)
+03/21 05:08:06 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 05:08:06 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8756 (8.9090)
+03/21 05:08:06 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 05:16:48 INFO train_distill_dimo.py:1149] Iteration 1050, lr_s=9.93e-06 lr_a=9.93e-06, time=49.49s
+03/21 05:16:48 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 05:16:48 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 05:16:48 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.6719 (6.0906)
+03/21 05:16:48 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5504 (0.7777)
+03/21 05:16:48 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0035 (0.0616)
+03/21 05:16:48 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 05:16:48 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0708 (9.0734)
+03/21 05:16:48 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 05:25:05 INFO train_distill_dimo.py:1149] Iteration 1060, lr_s=9.92e-06 lr_a=9.92e-06, time=50.78s
+03/21 05:25:05 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 05:25:05 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 05:25:05 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.7656 (6.0252)
+03/21 05:25:05 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.0919 (1.9229)
+03/21 05:25:05 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0039 (0.0677)
+03/21 05:25:05 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 05:25:05 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9538 (8.9999)
+03/21 05:25:05 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 05:33:42 INFO train_distill_dimo.py:1149] Iteration 1070, lr_s=9.92e-06 lr_a=9.92e-06, time=59.13s
+03/21 05:33:42 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 05:33:42 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 05:33:42 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.1719 (5.3824)
+03/21 05:33:42 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 2.3825 (3.1181)
+03/21 05:33:42 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0037 (0.0670)
+03/21 05:33:42 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 05:33:42 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8178 (8.8305)
+03/21 05:33:42 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 05:41:57 INFO train_distill_dimo.py:1149] Iteration 1080, lr_s=9.92e-06 lr_a=9.92e-06, time=48.33s
+03/21 05:41:57 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 05:41:57 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 05:41:57 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.3750 (5.8163)
+03/21 05:41:57 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 2.3020 (3.0523)
+03/21 05:41:57 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0083 (0.0602)
+03/21 05:41:57 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 05:41:57 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.6364 (8.6606)
+03/21 05:41:57 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 05:50:29 INFO train_distill_dimo.py:1149] Iteration 1090, lr_s=9.91e-06 lr_a=9.91e-06, time=53.91s
+03/21 05:50:29 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 05:50:29 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 05:50:29 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 8.2812 (7.0293)
+03/21 05:50:29 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 2.4450 (2.2095)
+03/21 05:50:29 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0011 (0.0370)
+03/21 05:50:29 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 05:50:29 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9990 (8.6759)
+03/21 05:50:29 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 05:58:54 INFO train_distill_dimo.py:1149] Iteration 1100, lr_s=9.91e-06 lr_a=9.91e-06, time=48.87s
+03/21 05:58:54 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 05:58:54 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 05:58:54 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.9688 (5.9730)
+03/21 05:58:54 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.2445 (1.8358)
+03/21 05:58:54 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0096 (0.0470)
+03/21 05:58:54 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 05:58:54 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.3465 (8.2718)
+03/21 05:58:54 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 05:58:54 INFO train_distill_dimo.py:1047] < PROGRESS: 11.01% | SPEED: 51.576s / step | ETA: 5 days, 7:30:28 >
+03/21 05:59:06 INFO train_distill_dimo.py:1135] [save] step=1100 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-1100 (ZeRO-3 Gathered)
+03/21 06:07:42 INFO train_distill_dimo.py:1149] Iteration 1110, lr_s=9.91e-06 lr_a=9.91e-06, time=49.20s
+03/21 06:07:42 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 06:07:42 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 06:07:42 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.2500 (5.0639)
+03/21 06:07:42 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.2212 (1.2558)
+03/21 06:07:42 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0120 (0.0899)
+03/21 06:07:42 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 06:07:42 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.1116 (7.1515)
+03/21 06:07:42 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 06:16:01 INFO train_distill_dimo.py:1149] Iteration 1120, lr_s=9.91e-06 lr_a=9.91e-06, time=49.31s
+03/21 06:16:01 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 06:16:01 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 06:16:01 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.0000 (4.1484)
+03/21 06:16:01 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5781 (0.7304)
+03/21 06:16:01 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0221 (0.0522)
+03/21 06:16:01 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 06:16:01 INFO train_distill_dimo.py:1160]     Train tok_entropy: 7.5319 (7.6026)
+03/21 06:16:01 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 06:24:22 INFO train_distill_dimo.py:1149] Iteration 1130, lr_s=9.90e-06 lr_a=9.90e-06, time=51.17s
+03/21 06:24:22 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 06:24:22 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 06:24:22 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.4062 (5.5313)
+03/21 06:24:22 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5145 (0.8243)
+03/21 06:24:22 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0066 (0.0809)
+03/21 06:24:22 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 06:24:22 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1579 (9.0816)
+03/21 06:24:22 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 06:33:03 INFO train_distill_dimo.py:1149] Iteration 1140, lr_s=9.90e-06 lr_a=9.90e-06, time=54.47s
+03/21 06:33:03 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 06:33:03 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 06:33:03 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 0.3325 (2.9070)
+03/21 06:33:03 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2157 (0.5905)
+03/21 06:33:03 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.2343 (0.1675)
+03/21 06:33:03 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 06:33:03 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5077 (8.4986)
+03/21 06:33:03 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 06:41:28 INFO train_distill_dimo.py:1149] Iteration 1150, lr_s=9.90e-06 lr_a=9.90e-06, time=50.52s
+03/21 06:41:28 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 06:41:28 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 06:41:28 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.6250 (4.7061)
+03/21 06:41:28 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5419 (1.2972)
+03/21 06:41:28 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0245 (0.0750)
+03/21 06:41:28 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 06:41:28 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8640 (8.7782)
+03/21 06:41:28 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 06:49:59 INFO train_distill_dimo.py:1149] Iteration 1160, lr_s=9.89e-06 lr_a=9.89e-06, time=49.22s
+03/21 06:49:59 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 06:49:59 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 06:49:59 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.2578 (3.1909)
+03/21 06:49:59 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3566 (0.3485)
+03/21 06:49:59 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0254 (0.1088)
+03/21 06:49:59 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 06:49:59 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0578 (8.9943)
+03/21 06:49:59 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 06:58:06 INFO train_distill_dimo.py:1149] Iteration 1170, lr_s=9.89e-06 lr_a=9.89e-06, time=47.43s
+03/21 06:58:06 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 06:58:06 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 06:58:06 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.7031 (5.2227)
+03/21 06:58:06 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4701 (1.9948)
+03/21 06:58:06 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0067 (0.0391)
+03/21 06:58:06 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 06:58:06 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7064 (8.6947)
+03/21 06:58:06 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 07:06:55 INFO train_distill_dimo.py:1149] Iteration 1180, lr_s=9.89e-06 lr_a=9.89e-06, time=51.31s
+03/21 07:06:55 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 07:06:55 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 07:06:55 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.6875 (5.6063)
+03/21 07:06:55 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.2435 (1.5048)
+03/21 07:06:55 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0021 (0.0258)
+03/21 07:06:55 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 07:06:55 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.0492 (7.7321)
+03/21 07:06:55 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 07:15:11 INFO train_distill_dimo.py:1149] Iteration 1190, lr_s=9.88e-06 lr_a=9.88e-06, time=51.36s
+03/21 07:15:11 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 07:15:11 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 07:15:11 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.5312 (5.0159)
+03/21 07:15:11 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6111 (0.9136)
+03/21 07:15:11 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0073 (0.0686)
+03/21 07:15:11 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 07:15:11 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7056 (8.7281)
+03/21 07:15:11 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 07:23:48 INFO train_distill_dimo.py:1149] Iteration 1200, lr_s=9.88e-06 lr_a=9.88e-06, time=48.61s
+03/21 07:23:48 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 07:23:48 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 07:23:48 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.1094 (5.3088)
+03/21 07:23:48 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.8130 (1.0152)
+03/21 07:23:48 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0072 (0.0614)
+03/21 07:23:48 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 07:23:48 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5569 (8.6073)
+03/21 07:23:48 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 07:23:48 INFO train_distill_dimo.py:1047] < PROGRESS: 12.01% | SPEED: 51.513s / step | ETA: 5 days, 5:55:15 >
+03/21 07:24:00 INFO train_distill_dimo.py:1135] [save] step=1200 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-1200 (ZeRO-3 Gathered)
+03/21 07:32:31 INFO train_distill_dimo.py:1149] Iteration 1210, lr_s=9.88e-06 lr_a=9.88e-06, time=48.24s
+03/21 07:32:31 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 07:32:31 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 07:32:31 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.2812 (6.3797)
+03/21 07:32:31 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.2355 (1.2646)
+03/21 07:32:31 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0072 (0.0132)
+03/21 07:32:31 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 07:32:31 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7211 (8.5768)
+03/21 07:32:31 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 07:40:48 INFO train_distill_dimo.py:1149] Iteration 1220, lr_s=9.87e-06 lr_a=9.87e-06, time=50.80s
+03/21 07:40:48 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 07:40:48 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 07:40:48 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.0312 (6.9078)
+03/21 07:40:48 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.2160 (2.5422)
+03/21 07:40:48 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0027 (0.0367)
+03/21 07:40:48 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 07:40:48 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.5084 (8.9216)
+03/21 07:40:48 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 07:49:02 INFO train_distill_dimo.py:1149] Iteration 1230, lr_s=9.87e-06 lr_a=9.87e-06, time=49.64s
+03/21 07:49:02 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 07:49:02 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 07:49:02 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.2656 (5.4934)
+03/21 07:49:02 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.0400 (1.3954)
+03/21 07:49:02 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0089 (0.0502)
+03/21 07:49:02 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 07:49:02 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4262 (8.4967)
+03/21 07:49:02 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 07:57:35 INFO train_distill_dimo.py:1149] Iteration 1240, lr_s=9.87e-06 lr_a=9.87e-06, time=49.13s
+03/21 07:57:35 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 07:57:35 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 07:57:35 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.1875 (4.4304)
+03/21 07:57:35 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5719 (0.6195)
+03/21 07:57:35 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0045 (0.0875)
+03/21 07:57:35 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 07:57:35 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8116 (8.9537)
+03/21 07:57:35 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 08:06:03 INFO train_distill_dimo.py:1149] Iteration 1250, lr_s=9.86e-06 lr_a=9.86e-06, time=49.80s
+03/21 08:06:03 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 08:06:03 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 08:06:03 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.9219 (5.7359)
+03/21 08:06:03 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.7264 (1.2820)
+03/21 08:06:03 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0070 (0.0576)
+03/21 08:06:03 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 08:06:03 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3038 (9.1118)
+03/21 08:06:03 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 08:14:19 INFO train_distill_dimo.py:1149] Iteration 1260, lr_s=9.86e-06 lr_a=9.86e-06, time=51.73s
+03/21 08:14:19 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 08:14:19 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 08:14:19 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.4062 (5.3852)
+03/21 08:14:19 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4785 (0.5053)
+03/21 08:14:19 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0023 (0.0295)
+03/21 08:14:19 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 08:14:19 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9166 (8.7797)
+03/21 08:14:19 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 08:22:46 INFO train_distill_dimo.py:1149] Iteration 1270, lr_s=9.86e-06 lr_a=9.86e-06, time=48.94s
+03/21 08:22:46 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 08:22:46 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 08:22:46 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.3594 (3.1872)
+03/21 08:22:46 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4078 (0.4185)
+03/21 08:22:46 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0372 (0.0998)
+03/21 08:22:46 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 08:22:46 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.2267 (9.0754)
+03/21 08:22:46 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 08:31:08 INFO train_distill_dimo.py:1149] Iteration 1280, lr_s=9.85e-06 lr_a=9.85e-06, time=50.72s
+03/21 08:31:08 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 08:31:08 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 08:31:08 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.5469 (5.6852)
+03/21 08:31:08 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6893 (1.4072)
+03/21 08:31:08 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0032 (0.0383)
+03/21 08:31:08 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 08:31:08 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5008 (8.5555)
+03/21 08:31:08 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 08:39:31 INFO train_distill_dimo.py:1149] Iteration 1290, lr_s=9.85e-06 lr_a=9.85e-06, time=50.23s
+03/21 08:39:31 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 08:39:31 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 08:39:31 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.7344 (4.0506)
+03/21 08:39:31 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6762 (0.8069)
+03/21 08:39:31 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.1164 (0.1248)
+03/21 08:39:31 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 08:39:31 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1564 (8.9856)
+03/21 08:39:31 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 08:47:48 INFO train_distill_dimo.py:1149] Iteration 1300, lr_s=9.84e-06 lr_a=9.84e-06, time=48.15s
+03/21 08:47:48 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 08:47:48 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 08:47:48 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.6250 (4.0988)
+03/21 08:47:48 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.7115 (1.2437)
+03/21 08:47:48 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0140 (0.0391)
+03/21 08:47:48 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 08:47:48 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.2974 (8.2477)
+03/21 08:47:48 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 08:47:48 INFO train_distill_dimo.py:1047] < PROGRESS: 13.01% | SPEED: 51.418s / step | ETA: 5 days, 4:15:38 >
+03/21 08:48:04 INFO train_distill_dimo.py:1135] [save] step=1300 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-1300 (ZeRO-3 Gathered)
+03/21 08:57:03 INFO train_distill_dimo.py:1149] Iteration 1310, lr_s=9.84e-06 lr_a=9.84e-06, time=55.26s
+03/21 08:57:03 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 08:57:03 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 08:57:03 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.5000 (5.8828)
+03/21 08:57:03 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5079 (0.5182)
+03/21 08:57:03 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0040 (0.0389)
+03/21 08:57:03 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 08:57:03 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.6611 (9.5945)
+03/21 08:57:03 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 09:05:22 INFO train_distill_dimo.py:1149] Iteration 1320, lr_s=9.84e-06 lr_a=9.84e-06, time=50.23s
+03/21 09:05:22 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 09:05:22 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 09:05:22 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.4844 (4.6672)
+03/21 09:05:22 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5748 (1.1183)
+03/21 09:05:22 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0139 (0.0755)
+03/21 09:05:22 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 09:05:22 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.2711 (9.2106)
+03/21 09:05:22 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 09:13:45 INFO train_distill_dimo.py:1149] Iteration 1330, lr_s=9.83e-06 lr_a=9.83e-06, time=51.99s
+03/21 09:13:45 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 09:13:45 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 09:13:45 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.4219 (4.4834)
+03/21 09:13:45 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4547 (0.7838)
+03/21 09:13:45 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0159 (0.0631)
+03/21 09:13:45 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 09:13:45 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.6103 (9.5916)
+03/21 09:13:45 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 09:22:37 INFO train_distill_dimo.py:1149] Iteration 1340, lr_s=9.83e-06 lr_a=9.83e-06, time=50.66s
+03/21 09:22:37 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 09:22:37 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 09:22:37 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.2344 (4.7958)
+03/21 09:22:37 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6109 (0.6300)
+03/21 09:22:37 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0060 (0.0701)
+03/21 09:22:37 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 09:22:37 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.4832 (9.4613)
+03/21 09:22:37 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 09:30:52 INFO train_distill_dimo.py:1149] Iteration 1350, lr_s=9.82e-06 lr_a=9.82e-06, time=49.84s
+03/21 09:30:52 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 09:30:52 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 09:30:52 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.0938 (5.0000)
+03/21 09:30:52 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3926 (0.3861)
+03/21 09:30:52 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0053 (0.0215)
+03/21 09:30:52 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 09:30:52 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.5881 (9.5743)
+03/21 09:30:52 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 09:39:48 INFO train_distill_dimo.py:1149] Iteration 1360, lr_s=9.82e-06 lr_a=9.82e-06, time=53.61s
+03/21 09:39:48 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 09:39:48 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 09:39:48 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.9844 (4.8796)
+03/21 09:39:48 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4944 (0.7141)
+03/21 09:39:48 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0047 (0.0989)
+03/21 09:39:48 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 09:39:48 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3692 (9.3582)
+03/21 09:39:48 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 09:48:20 INFO train_distill_dimo.py:1149] Iteration 1370, lr_s=9.82e-06 lr_a=9.82e-06, time=49.48s
+03/21 09:48:20 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 09:48:20 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 09:48:20 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.2344 (6.0531)
+03/21 09:48:20 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4714 (1.4468)
+03/21 09:48:20 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0137 (0.0145)
+03/21 09:48:20 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 09:48:20 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1914 (9.1210)
+03/21 09:48:20 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 09:56:55 INFO train_distill_dimo.py:1149] Iteration 1380, lr_s=9.81e-06 lr_a=9.81e-06, time=56.61s
+03/21 09:56:55 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 09:56:55 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 09:56:55 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.9531 (4.7883)
+03/21 09:56:55 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.7553 (0.9244)
+03/21 09:56:55 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0212 (0.0680)
+03/21 09:56:55 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 09:56:55 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5764 (8.5394)
+03/21 09:56:55 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 10:05:27 INFO train_distill_dimo.py:1149] Iteration 1390, lr_s=9.81e-06 lr_a=9.81e-06, time=49.22s
+03/21 10:05:27 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 10:05:27 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 10:05:27 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.1953 (4.8509)
+03/21 10:05:27 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4647 (1.0215)
+03/21 10:05:27 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0204 (0.0584)
+03/21 10:05:27 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 10:05:27 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5455 (8.5712)
+03/21 10:05:27 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 10:13:56 INFO train_distill_dimo.py:1149] Iteration 1400, lr_s=9.80e-06 lr_a=9.80e-06, time=52.60s
+03/21 10:13:56 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 10:13:56 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 10:13:56 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.0625 (5.7859)
+03/21 10:13:56 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4400 (0.8384)
+03/21 10:13:56 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0048 (0.0278)
+03/21 10:13:56 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 10:13:56 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0178 (8.8805)
+03/21 10:13:56 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 10:13:56 INFO train_distill_dimo.py:1047] < PROGRESS: 14.01% | SPEED: 51.425s / step | ETA: 5 days, 2:50:58 >
+03/21 10:14:12 INFO train_distill_dimo.py:1135] [save] step=1400 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-1400 (ZeRO-3 Gathered)
+03/21 10:22:35 INFO train_distill_dimo.py:1149] Iteration 1410, lr_s=9.80e-06 lr_a=9.80e-06, time=49.65s
+03/21 10:22:35 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 10:22:35 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 10:22:35 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.7656 (6.0398)
+03/21 10:22:35 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5099 (0.8093)
+03/21 10:22:35 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0030 (0.0156)
+03/21 10:22:35 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 10:22:35 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.6909 (9.6404)
+03/21 10:22:35 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 10:31:05 INFO train_distill_dimo.py:1149] Iteration 1420, lr_s=9.79e-06 lr_a=9.79e-06, time=48.88s
+03/21 10:31:05 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 10:31:05 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 10:31:05 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.2656 (5.6930)
+03/21 10:31:05 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.8238 (2.3675)
+03/21 10:31:05 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0188 (0.0487)
+03/21 10:31:05 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 10:31:05 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.2086 (8.1436)
+03/21 10:31:05 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 10:39:41 INFO train_distill_dimo.py:1149] Iteration 1430, lr_s=9.79e-06 lr_a=9.79e-06, time=52.64s
+03/21 10:39:41 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 10:39:41 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 10:39:41 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 1.5078 (3.4062)
+03/21 10:39:41 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3746 (0.6064)
+03/21 10:39:41 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.1018 (0.0923)
+03/21 10:39:41 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 10:39:41 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7355 (8.5976)
+03/21 10:39:41 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 10:48:22 INFO train_distill_dimo.py:1149] Iteration 1440, lr_s=9.78e-06 lr_a=9.78e-06, time=53.63s
+03/21 10:48:22 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 10:48:22 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 10:48:22 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.6953 (5.0207)
+03/21 10:48:22 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.0917 (1.6413)
+03/21 10:48:22 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0462 (0.1491)
+03/21 10:48:22 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 10:48:22 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3044 (9.3268)
+03/21 10:48:22 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 10:56:52 INFO train_distill_dimo.py:1149] Iteration 1450, lr_s=9.78e-06 lr_a=9.78e-06, time=48.68s
+03/21 10:56:52 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 10:56:52 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 10:56:52 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.6719 (5.4057)
+03/21 10:56:52 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5983 (1.3840)
+03/21 10:56:52 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0026 (0.0580)
+03/21 10:56:52 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 10:56:52 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3215 (9.2130)
+03/21 10:56:52 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 11:05:30 INFO train_distill_dimo.py:1149] Iteration 1460, lr_s=9.78e-06 lr_a=9.78e-06, time=48.50s
+03/21 11:05:30 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 11:05:30 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 11:05:30 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.4219 (6.5524)
+03/21 11:05:30 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5176 (0.7632)
+03/21 11:05:30 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0028 (0.0337)
+03/21 11:05:30 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 11:05:30 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1610 (9.0038)
+03/21 11:05:30 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 11:14:11 INFO train_distill_dimo.py:1149] Iteration 1470, lr_s=9.77e-06 lr_a=9.77e-06, time=50.25s
+03/21 11:14:11 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 11:14:11 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 11:14:11 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.7969 (4.7759)
+03/21 11:14:11 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4026 (0.4310)
+03/21 11:14:11 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0117 (0.0415)
+03/21 11:14:11 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 11:14:11 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3880 (9.2901)
+03/21 11:14:11 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 11:22:48 INFO train_distill_dimo.py:1149] Iteration 1480, lr_s=9.77e-06 lr_a=9.77e-06, time=49.85s
+03/21 11:22:48 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 11:22:48 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 11:22:48 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.3750 (5.5516)
+03/21 11:22:48 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4660 (0.4910)
+03/21 11:22:48 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0047 (0.0115)
+03/21 11:22:48 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 11:22:48 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1817 (9.2375)
+03/21 11:22:48 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 11:31:11 INFO train_distill_dimo.py:1149] Iteration 1490, lr_s=9.76e-06 lr_a=9.76e-06, time=52.71s
+03/21 11:31:11 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 11:31:11 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 11:31:11 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.2344 (5.4670)
+03/21 11:31:11 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4608 (1.2465)
+03/21 11:31:11 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0046 (0.0514)
+03/21 11:31:11 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 11:31:11 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3328 (9.1518)
+03/21 11:31:11 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 11:39:29 INFO train_distill_dimo.py:1149] Iteration 1500, lr_s=9.76e-06 lr_a=9.76e-06, time=49.31s
+03/21 11:39:29 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 11:39:29 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 11:39:29 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.4844 (5.3910)
+03/21 11:39:29 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4534 (0.7181)
+03/21 11:39:29 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0038 (0.0125)
+03/21 11:39:29 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 11:39:29 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3977 (9.4267)
+03/21 11:39:29 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 11:39:29 INFO train_distill_dimo.py:1047] < PROGRESS: 15.01% | SPEED: 51.408s / step | ETA: 5 days, 1:22:51 >
+03/21 11:39:40 INFO train_distill_dimo.py:1135] [save] step=1500 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-1500 (ZeRO-3 Gathered)
+03/21 11:48:13 INFO train_distill_dimo.py:1149] Iteration 1510, lr_s=9.75e-06 lr_a=9.75e-06, time=57.01s
+03/21 11:48:13 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 11:48:13 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 11:48:13 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.8125 (7.1672)
+03/21 11:48:13 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 1.3154 (1.9404)
+03/21 11:48:13 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0065 (0.0088)
+03/21 11:48:13 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 11:48:13 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9392 (9.0002)
+03/21 11:48:13 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 11:56:33 INFO train_distill_dimo.py:1149] Iteration 1520, lr_s=9.75e-06 lr_a=9.75e-06, time=50.32s
+03/21 11:56:33 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 11:56:33 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 11:56:33 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.2734 (4.3268)
+03/21 11:56:33 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2776 (0.9766)
+03/21 11:56:33 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.1125 (0.1613)
+03/21 11:56:33 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 11:56:33 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0866 (8.7773)
+03/21 11:56:33 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 12:04:56 INFO train_distill_dimo.py:1149] Iteration 1530, lr_s=9.74e-06 lr_a=9.74e-06, time=53.02s
+03/21 12:04:56 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 12:04:56 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 12:04:56 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.3281 (5.0567)
+03/21 12:04:56 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4070 (0.6713)
+03/21 12:04:56 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0066 (0.0974)
+03/21 12:04:56 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 12:04:56 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.4608 (9.4690)
+03/21 12:04:56 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 12:13:34 INFO train_distill_dimo.py:1149] Iteration 1540, lr_s=9.74e-06 lr_a=9.74e-06, time=53.11s
+03/21 12:13:34 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 12:13:34 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 12:13:34 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.1953 (3.5671)
+03/21 12:13:34 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2128 (0.2665)
+03/21 12:13:34 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0206 (0.0612)
+03/21 12:13:34 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 12:13:34 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3379 (9.2537)
+03/21 12:13:34 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 12:22:13 INFO train_distill_dimo.py:1149] Iteration 1550, lr_s=9.73e-06 lr_a=9.73e-06, time=49.02s
+03/21 12:22:13 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 12:22:13 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 12:22:13 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.3281 (4.4828)
+03/21 12:22:13 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5430 (0.9791)
+03/21 12:22:13 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0228 (0.0938)
+03/21 12:22:13 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 12:22:13 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.4356 (8.2593)
+03/21 12:22:13 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 12:30:43 INFO train_distill_dimo.py:1149] Iteration 1560, lr_s=9.73e-06 lr_a=9.73e-06, time=54.22s
+03/21 12:30:43 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 12:30:43 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 12:30:43 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.0820 (3.8568)
+03/21 12:30:43 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3278 (0.6778)
+03/21 12:30:43 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0878 (0.1026)
+03/21 12:30:43 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 12:30:43 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8989 (9.0216)
+03/21 12:30:43 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 12:39:12 INFO train_distill_dimo.py:1149] Iteration 1570, lr_s=9.72e-06 lr_a=9.72e-06, time=49.45s
+03/21 12:39:12 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 12:39:12 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 12:39:12 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.0547 (2.9797)
+03/21 12:39:12 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3340 (0.3049)
+03/21 12:39:12 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0456 (0.0975)
+03/21 12:39:12 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 12:39:12 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.6569 (9.6231)
+03/21 12:39:12 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 12:47:48 INFO train_distill_dimo.py:1149] Iteration 1580, lr_s=9.72e-06 lr_a=9.72e-06, time=57.62s
+03/21 12:47:48 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 12:47:48 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 12:47:48 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 2.8125 (3.3063)
+03/21 12:47:48 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3177 (0.2813)
+03/21 12:47:48 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0243 (0.0340)
+03/21 12:47:48 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 12:47:48 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1190 (9.1760)
+03/21 12:47:48 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 12:56:12 INFO train_distill_dimo.py:1149] Iteration 1590, lr_s=9.71e-06 lr_a=9.71e-06, time=50.45s
+03/21 12:56:12 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 12:56:12 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 12:56:12 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.6562 (3.7201)
+03/21 12:56:12 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4219 (0.5695)
+03/21 12:56:12 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0236 (0.0699)
+03/21 12:56:12 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 12:56:12 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0566 (8.9527)
+03/21 12:56:12 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 13:04:33 INFO train_distill_dimo.py:1149] Iteration 1600, lr_s=9.71e-06 lr_a=9.71e-06, time=49.33s
+03/21 13:04:33 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 13:04:33 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 13:04:33 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.0938 (4.6819)
+03/21 13:04:33 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6518 (1.1273)
+03/21 13:04:33 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0751 (0.1628)
+03/21 13:04:33 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 13:04:33 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7820 (8.3973)
+03/21 13:04:33 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 13:04:33 INFO train_distill_dimo.py:1047] < PROGRESS: 16.01% | SPEED: 51.379s / step | ETA: 4 days, 23:53:00 >
+03/21 13:04:44 INFO train_distill_dimo.py:1135] [save] step=1600 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-1600 (ZeRO-3 Gathered)
+03/21 13:13:08 INFO train_distill_dimo.py:1149] Iteration 1610, lr_s=9.70e-06 lr_a=9.70e-06, time=50.15s
+03/21 13:13:08 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 13:13:08 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 13:13:08 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.0078 (3.9663)
+03/21 13:13:08 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4120 (0.4439)
+03/21 13:13:08 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0187 (0.1266)
+03/21 13:13:08 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 13:13:08 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7902 (8.8154)
+03/21 13:13:08 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 13:21:38 INFO train_distill_dimo.py:1149] Iteration 1620, lr_s=9.70e-06 lr_a=9.70e-06, time=55.32s
+03/21 13:21:38 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 13:21:38 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 13:21:38 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.2031 (4.3298)
+03/21 13:21:38 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3725 (0.5400)
+03/21 13:21:38 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0145 (0.0769)
+03/21 13:21:38 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 13:21:38 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3345 (9.3557)
+03/21 13:21:38 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 13:30:04 INFO train_distill_dimo.py:1149] Iteration 1630, lr_s=9.69e-06 lr_a=9.69e-06, time=53.07s
+03/21 13:30:04 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 13:30:04 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 13:30:04 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.3906 (4.4437)
+03/21 13:30:04 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4166 (0.5693)
+03/21 13:30:04 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0113 (0.0296)
+03/21 13:30:04 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 13:30:04 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8699 (8.8438)
+03/21 13:30:04 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 13:39:00 INFO train_distill_dimo.py:1149] Iteration 1640, lr_s=9.68e-06 lr_a=9.68e-06, time=51.30s
+03/21 13:39:00 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 13:39:00 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 13:39:00 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.0625 (6.6219)
+03/21 13:39:00 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3726 (1.3244)
+03/21 13:39:00 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0042 (0.0229)
+03/21 13:39:00 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 13:39:00 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1070 (9.0967)
+03/21 13:39:00 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 13:47:16 INFO train_distill_dimo.py:1149] Iteration 1650, lr_s=9.68e-06 lr_a=9.68e-06, time=48.57s
+03/21 13:47:16 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 13:47:16 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 13:47:16 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.7656 (7.1203)
+03/21 13:47:16 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5089 (0.7381)
+03/21 13:47:16 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0012 (0.0100)
+03/21 13:47:16 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 13:47:16 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7343 (8.7334)
+03/21 13:47:16 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 13:55:42 INFO train_distill_dimo.py:1149] Iteration 1660, lr_s=9.67e-06 lr_a=9.67e-06, time=48.64s
+03/21 13:55:42 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 13:55:42 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 13:55:42 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.6094 (5.0996)
+03/21 13:55:42 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4430 (0.8258)
+03/21 13:55:42 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0040 (0.0619)
+03/21 13:55:42 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 13:55:42 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.6500 (8.5350)
+03/21 13:55:42 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 14:04:10 INFO train_distill_dimo.py:1149] Iteration 1670, lr_s=9.67e-06 lr_a=9.67e-06, time=48.10s
+03/21 14:04:10 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 14:04:10 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 14:04:10 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.4062 (5.2803)
+03/21 14:04:10 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4198 (0.9443)
+03/21 14:04:10 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0039 (0.1003)
+03/21 14:04:10 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 14:04:10 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0319 (8.9535)
+03/21 14:04:10 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 14:12:41 INFO train_distill_dimo.py:1149] Iteration 1680, lr_s=9.66e-06 lr_a=9.66e-06, time=50.63s
+03/21 14:12:41 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 14:12:41 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 14:12:41 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.5625 (5.3189)
+03/21 14:12:41 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3910 (0.3980)
+03/21 14:12:41 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0025 (0.1042)
+03/21 14:12:41 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 14:12:41 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8841 (8.9065)
+03/21 14:12:41 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 14:21:03 INFO train_distill_dimo.py:1149] Iteration 1690, lr_s=9.66e-06 lr_a=9.66e-06, time=53.38s
+03/21 14:21:03 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 14:21:03 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 14:21:03 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.1719 (6.6361)
+03/21 14:21:03 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5585 (1.7093)
+03/21 14:21:03 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0019 (0.0542)
+03/21 14:21:03 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 14:21:03 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1388 (9.1276)
+03/21 14:21:03 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 14:29:26 INFO train_distill_dimo.py:1149] Iteration 1700, lr_s=9.65e-06 lr_a=9.65e-06, time=51.04s
+03/21 14:29:26 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 14:29:26 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 14:29:26 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.4219 (6.6297)
+03/21 14:29:26 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4306 (0.5691)
+03/21 14:29:26 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0018 (0.0058)
+03/21 14:29:26 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 14:29:26 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9264 (8.8499)
+03/21 14:29:26 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 14:29:26 INFO train_distill_dimo.py:1047] < PROGRESS: 17.01% | SPEED: 51.346s / step | ETA: 4 days, 22:22:49 >
+03/21 14:29:37 INFO train_distill_dimo.py:1135] [save] step=1700 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-1700 (ZeRO-3 Gathered)
+03/21 14:38:13 INFO train_distill_dimo.py:1149] Iteration 1710, lr_s=9.65e-06 lr_a=9.65e-06, time=55.16s
+03/21 14:38:13 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 14:38:13 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 14:38:13 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.0625 (6.1059)
+03/21 14:38:13 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5877 (0.6463)
+03/21 14:38:13 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0015 (0.0390)
+03/21 14:38:13 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 14:38:13 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.2729 (9.2233)
+03/21 14:38:13 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 14:46:43 INFO train_distill_dimo.py:1149] Iteration 1720, lr_s=9.64e-06 lr_a=9.64e-06, time=49.31s
+03/21 14:46:43 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 14:46:43 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 14:46:43 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.9375 (4.4347)
+03/21 14:46:43 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.7863 (1.7196)
+03/21 14:46:43 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0128 (0.0486)
+03/21 14:46:43 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 14:46:43 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5637 (8.2877)
+03/21 14:46:43 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 14:55:14 INFO train_distill_dimo.py:1149] Iteration 1730, lr_s=9.63e-06 lr_a=9.63e-06, time=49.71s
+03/21 14:55:14 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 14:55:14 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 14:55:14 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.0078 (3.0935)
+03/21 14:55:14 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3680 (0.3130)
+03/21 14:55:14 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0426 (0.1069)
+03/21 14:55:14 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 14:55:14 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9166 (8.8111)
+03/21 14:55:14 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 15:03:28 INFO train_distill_dimo.py:1149] Iteration 1740, lr_s=9.63e-06 lr_a=9.63e-06, time=51.10s
+03/21 15:03:28 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 15:03:28 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 15:03:28 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.2656 (5.9310)
+03/21 15:03:28 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4867 (0.5420)
+03/21 15:03:28 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0009 (0.0388)
+03/21 15:03:28 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 15:03:28 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0305 (9.0726)
+03/21 15:03:28 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 15:11:49 INFO train_distill_dimo.py:1149] Iteration 1750, lr_s=9.62e-06 lr_a=9.62e-06, time=49.98s
+03/21 15:11:49 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 15:11:49 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 15:11:49 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 2.8047 (2.7704)
+03/21 15:11:49 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2517 (0.3457)
+03/21 15:11:49 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0170 (0.0647)
+03/21 15:11:49 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 15:11:49 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1213 (9.0681)
+03/21 15:11:49 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 15:20:18 INFO train_distill_dimo.py:1149] Iteration 1760, lr_s=9.62e-06 lr_a=9.62e-06, time=54.80s
+03/21 15:20:18 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 15:20:18 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 15:20:18 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.6562 (5.5719)
+03/21 15:20:18 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4338 (1.0349)
+03/21 15:20:18 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0083 (0.0704)
+03/21 15:20:18 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 15:20:18 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1135 (9.0942)
+03/21 15:20:18 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 15:28:55 INFO train_distill_dimo.py:1149] Iteration 1770, lr_s=9.61e-06 lr_a=9.61e-06, time=49.13s
+03/21 15:28:55 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 15:28:55 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 15:28:55 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.5625 (4.6814)
+03/21 15:28:55 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4264 (0.7389)
+03/21 15:28:55 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0321 (0.0857)
+03/21 15:28:55 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 15:28:55 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7759 (8.8866)
+03/21 15:28:55 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 15:37:09 INFO train_distill_dimo.py:1149] Iteration 1780, lr_s=9.60e-06 lr_a=9.60e-06, time=49.12s
+03/21 15:37:09 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 15:37:09 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 15:37:09 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.7109 (5.2768)
+03/21 15:37:09 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3911 (0.8267)
+03/21 15:37:09 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0141 (0.0791)
+03/21 15:37:09 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 15:37:09 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1824 (9.2173)
+03/21 15:37:09 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 15:45:43 INFO train_distill_dimo.py:1149] Iteration 1790, lr_s=9.60e-06 lr_a=9.60e-06, time=52.88s
+03/21 15:45:43 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 15:45:43 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 15:45:43 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.5703 (3.8482)
+03/21 15:45:43 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3157 (1.0296)
+03/21 15:45:43 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0275 (0.0977)
+03/21 15:45:43 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 15:45:43 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.2595 (9.1730)
+03/21 15:45:43 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 15:53:59 INFO train_distill_dimo.py:1149] Iteration 1800, lr_s=9.59e-06 lr_a=9.59e-06, time=48.62s
+03/21 15:53:59 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 15:53:59 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 15:53:59 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.4766 (4.1184)
+03/21 15:53:59 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2149 (0.3815)
+03/21 15:53:59 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0360 (0.1278)
+03/21 15:53:59 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 15:53:59 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1095 (9.1453)
+03/21 15:53:59 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 15:53:59 INFO train_distill_dimo.py:1047] < PROGRESS: 18.01% | SPEED: 51.306s / step | ETA: 4 days, 20:51:45 >
+03/21 15:54:11 INFO train_distill_dimo.py:1135] [save] step=1800 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-1800 (ZeRO-3 Gathered)
+03/21 16:02:43 INFO train_distill_dimo.py:1149] Iteration 1810, lr_s=9.58e-06 lr_a=9.58e-06, time=49.79s
+03/21 16:02:43 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 16:02:43 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 16:02:43 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.4531 (5.0183)
+03/21 16:02:43 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4605 (0.8332)
+03/21 16:02:43 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0112 (0.0822)
+03/21 16:02:43 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 16:02:43 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.2654 (9.0334)
+03/21 16:02:43 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 16:11:07 INFO train_distill_dimo.py:1149] Iteration 1820, lr_s=9.58e-06 lr_a=9.58e-06, time=51.09s
+03/21 16:11:07 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 16:11:07 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 16:11:07 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.8672 (3.8044)
+03/21 16:11:07 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4913 (0.6252)
+03/21 16:11:07 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0449 (0.1264)
+03/21 16:11:07 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 16:11:07 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.7857 (8.7435)
+03/21 16:11:07 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 16:19:36 INFO train_distill_dimo.py:1149] Iteration 1830, lr_s=9.57e-06 lr_a=9.57e-06, time=50.08s
+03/21 16:19:36 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 16:19:36 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 16:19:36 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 6.8750 (5.5548)
+03/21 16:19:36 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.6672 (0.9893)
+03/21 16:19:36 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0058 (0.0710)
+03/21 16:19:36 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 16:19:36 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0688 (9.0476)
+03/21 16:19:36 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 16:28:08 INFO train_distill_dimo.py:1149] Iteration 1840, lr_s=9.57e-06 lr_a=9.57e-06, time=49.96s
+03/21 16:28:08 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 16:28:08 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 16:28:08 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.0781 (5.0627)
+03/21 16:28:08 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3633 (0.9976)
+03/21 16:28:08 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0097 (0.0638)
+03/21 16:28:08 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 16:28:08 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.2916 (8.9247)
+03/21 16:28:08 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 16:36:36 INFO train_distill_dimo.py:1149] Iteration 1850, lr_s=9.56e-06 lr_a=9.56e-06, time=54.19s
+03/21 16:36:36 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 16:36:36 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 16:36:36 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 0.0575 (1.0884)
+03/21 16:36:36 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.0857 (0.1329)
+03/21 16:36:36 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.2901 (0.2392)
+03/21 16:36:36 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 16:36:36 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.2192 (9.2234)
+03/21 16:36:36 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 16:45:11 INFO train_distill_dimo.py:1149] Iteration 1860, lr_s=9.55e-06 lr_a=9.55e-06, time=49.99s
+03/21 16:45:11 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 16:45:11 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 16:45:11 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.4688 (6.8020)
+03/21 16:45:11 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4976 (0.8029)
+03/21 16:45:11 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0010 (0.0117)
+03/21 16:45:11 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 16:45:11 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.6705 (8.7319)
+03/21 16:45:11 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 16:53:41 INFO train_distill_dimo.py:1149] Iteration 1870, lr_s=9.55e-06 lr_a=9.55e-06, time=61.48s
+03/21 16:53:41 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 16:53:41 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 16:53:41 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.9375 (5.4840)
+03/21 16:53:41 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3954 (0.4071)
+03/21 16:53:41 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0031 (0.0360)
+03/21 16:53:41 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 16:53:41 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1309 (9.1313)
+03/21 16:53:41 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 17:02:11 INFO train_distill_dimo.py:1149] Iteration 1880, lr_s=9.54e-06 lr_a=9.54e-06, time=52.60s
+03/21 17:02:11 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 17:02:11 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 17:02:11 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 2.2891 (3.2094)
+03/21 17:02:11 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2237 (0.3893)
+03/21 17:02:11 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0640 (0.0600)
+03/21 17:02:11 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 17:02:11 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.5267 (8.6924)
+03/21 17:02:11 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 17:10:27 INFO train_distill_dimo.py:1149] Iteration 1890, lr_s=9.53e-06 lr_a=9.53e-06, time=49.32s
+03/21 17:10:27 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 17:10:27 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 17:10:27 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 2.7266 (4.0855)
+03/21 17:10:27 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3418 (0.8000)
+03/21 17:10:27 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0390 (0.0849)
+03/21 17:10:27 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 17:10:27 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.8930 (8.6296)
+03/21 17:10:27 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 17:18:40 INFO train_distill_dimo.py:1149] Iteration 1900, lr_s=9.53e-06 lr_a=9.53e-06, time=48.48s
+03/21 17:18:40 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 17:18:40 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 17:18:40 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 1.6992 (2.5487)
+03/21 17:18:40 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.2723 (0.2652)
+03/21 17:18:40 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.1316 (0.1355)
+03/21 17:18:40 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 17:18:40 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9431 (8.8799)
+03/21 17:18:40 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 17:18:40 INFO train_distill_dimo.py:1047] < PROGRESS: 19.01% | SPEED: 51.273s / step | ETA: 4 days, 19:21:54 >
+03/21 17:18:51 INFO train_distill_dimo.py:1135] [save] step=1900 → ./experiments/distill_dimo_v3/checkpoints/checkpoint-1900 (ZeRO-3 Gathered)
+03/21 17:27:08 INFO train_distill_dimo.py:1149] Iteration 1910, lr_s=9.52e-06 lr_a=9.52e-06, time=49.46s
+03/21 17:27:08 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 17:27:08 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 17:27:08 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.8125 (4.9609)
+03/21 17:27:08 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3484 (0.7257)
+03/21 17:27:08 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0028 (0.0642)
+03/21 17:27:08 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 17:27:08 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0855 (9.1605)
+03/21 17:27:08 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 17:35:39 INFO train_distill_dimo.py:1149] Iteration 1920, lr_s=9.51e-06 lr_a=9.51e-06, time=49.28s
+03/21 17:35:39 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 17:35:39 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 17:35:39 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.4688 (5.3229)
+03/21 17:35:39 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4537 (0.5863)
+03/21 17:35:39 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0074 (0.0483)
+03/21 17:35:39 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 17:35:39 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.1902 (9.1589)
+03/21 17:35:39 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 17:43:56 INFO train_distill_dimo.py:1149] Iteration 1930, lr_s=9.51e-06 lr_a=9.51e-06, time=49.26s
+03/21 17:43:56 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 17:43:56 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 17:43:56 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 5.4062 (4.8886)
+03/21 17:43:56 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.4423 (0.6278)
+03/21 17:43:56 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0071 (0.0658)
+03/21 17:43:56 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 17:43:56 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3326 (9.3497)
+03/21 17:43:56 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 17:52:23 INFO train_distill_dimo.py:1149] Iteration 1940, lr_s=9.50e-06 lr_a=9.50e-06, time=48.74s
+03/21 17:52:23 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 17:52:23 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 17:52:23 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 4.1406 (3.9742)
+03/21 17:52:23 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3540 (0.3442)
+03/21 17:52:23 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0163 (0.0268)
+03/21 17:52:23 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 17:52:23 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0917 (9.0411)
+03/21 17:52:23 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 18:00:49 INFO train_distill_dimo.py:1149] Iteration 1950, lr_s=9.49e-06 lr_a=9.49e-06, time=49.25s
+03/21 18:00:49 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 18:00:49 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 18:00:49 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 7.1562 (6.6547)
+03/21 18:00:49 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5106 (1.1109)
+03/21 18:00:49 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0054 (0.0072)
+03/21 18:00:49 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 18:00:49 INFO train_distill_dimo.py:1160]     Train tok_entropy: 8.9659 (8.9808)
+03/21 18:00:49 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 18:09:07 INFO train_distill_dimo.py:1149] Iteration 1960, lr_s=9.49e-06 lr_a=9.49e-06, time=50.55s
+03/21 18:09:07 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 18:09:07 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 18:09:07 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 3.9141 (3.7931)
+03/21 18:09:07 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.3625 (0.3407)
+03/21 18:09:07 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0407 (0.0780)
+03/21 18:09:07 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 18:09:07 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.3255 (9.2506)
+03/21 18:09:07 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
+03/21 18:17:43 INFO train_distill_dimo.py:1149] Iteration 1970, lr_s=9.48e-06 lr_a=9.48e-06, time=51.47s
+03/21 18:17:43 INFO train_distill_dimo.py:1160]     Train H_mean: 0.0000 (0.0000)
+03/21 18:17:43 INFO train_distill_dimo.py:1160]     Train baseline_ema: 0.0000 (0.0000)
+03/21 18:17:43 INFO train_distill_dimo.py:1160]     Train loss_aux_cond: 8.0938 (6.7119)
+03/21 18:17:43 INFO train_distill_dimo.py:1160]     Train loss_kd_cond: 0.5749 (0.8289)
+03/21 18:17:43 INFO train_distill_dimo.py:1160]     Train loss_pg: 0.0013 (0.0356)
+03/21 18:17:43 INFO train_distill_dimo.py:1160]     Train mean_logp_tok: 0.0000 (0.0000)
+03/21 18:17:43 INFO train_distill_dimo.py:1160]     Train tok_entropy: 9.0420 (9.0304)
+03/21 18:17:43 INFO train_distill_dimo.py:1160]     Train use_guided_ratio: 1.0000 (1.0000)
diff --git a/URSA/experiments/smoke/config.yaml b/URSA/experiments/smoke/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..99638224d34ce5ec94cd8d478df68fa0fbbfb8c6
--- /dev/null
+++ b/URSA/experiments/smoke/config.yaml
@@ -0,0 +1,69 @@
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
diff --git a/URSA/experiments/smoke/logs/20260318_142425.log b/URSA/experiments/smoke/logs/20260318_142425.log
new file mode 100644
index 0000000000000000000000000000000000000000..6b4cd2aea726bf90466bb2b5d16854e27590e5a7
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_142425.log
@@ -0,0 +1,72 @@
+03/18 14:24:25 INFO train_distill_dimo.py:792] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: ./URSA-1.7B/
+  prompt_source: ./Koala-36M-v1/
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 14:24:25 INFO train_distill_dimo.py:123] [init] Loading teacher from ./URSA-1.7B/ ...
diff --git a/URSA/experiments/smoke/logs/20260318_142528.log b/URSA/experiments/smoke/logs/20260318_142528.log
new file mode 100644
index 0000000000000000000000000000000000000000..34a98543755992923c7bdc8426c2bfb08c3c3a0f
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_142528.log
@@ -0,0 +1,73 @@
+03/18 14:25:28 INFO train_distill_dimo.py:792] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: ./Koala-36M-v1/
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 14:25:28 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 14:26:36 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
diff --git a/URSA/experiments/smoke/logs/20260318_142714.log b/URSA/experiments/smoke/logs/20260318_142714.log
new file mode 100644
index 0000000000000000000000000000000000000000..24b2be5f7f4473bde57c095f49e03caafe38e243
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_142714.log
@@ -0,0 +1,76 @@
+03/18 14:27:14 INFO train_distill_dimo.py:792] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 14:27:14 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 14:28:19 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 14:28:23 INFO train_distill_dimo.py:242] [init] student params: 1982.17M
+03/18 14:28:23 INFO train_distill_dimo.py:245] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=1
+03/18 14:28:23 INFO train_distill_dimo.py:597] [train] Starting from step 0 / 50
diff --git a/URSA/experiments/smoke/logs/20260318_143146.log b/URSA/experiments/smoke/logs/20260318_143146.log
new file mode 100644
index 0000000000000000000000000000000000000000..b411261114c7f724d42872e818fcb0cbaf495a6c
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_143146.log
@@ -0,0 +1,76 @@
+03/18 14:31:46 INFO train_distill_dimo.py:792] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 14:31:46 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 14:32:54 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 14:32:56 INFO train_distill_dimo.py:242] [init] student params: 1982.17M
+03/18 14:32:56 INFO train_distill_dimo.py:245] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=1
+03/18 14:32:56 INFO train_distill_dimo.py:597] [train] Starting from step 0 / 50
diff --git a/URSA/experiments/smoke/logs/20260318_143704.log b/URSA/experiments/smoke/logs/20260318_143704.log
new file mode 100644
index 0000000000000000000000000000000000000000..94038df8d8b4b1ee9c8324fd48177be4d8b0e6e6
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_143704.log
@@ -0,0 +1,79 @@
+03/18 14:37:04 INFO train_distill_dimo.py:792] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 14:37:04 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 14:38:14 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 14:38:19 INFO train_distill_dimo.py:242] [init] student params: 1982.17M
+03/18 14:38:19 INFO train_distill_dimo.py:245] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=1
+03/18 14:38:19 INFO train_distill_dimo.py:597] [train] Starting from step 0 / 50
+03/18 14:38:40 INFO train_distill_dimo.py:753] [assert] Step-1 shape/grad assertions PASSED ✓
+03/18 14:38:40 INFO train_distill_dimo.py:754] [assert] z_T_cond  shape=torch.Size([1, 5120, 64000])  min=-1.531  max=49.000
+03/18 14:38:40 INFO train_distill_dimo.py:759] [assert] z_S_cond  shape=torch.Size([1, 5120, 64000])  min=-1.531  max=49.000
diff --git a/URSA/experiments/smoke/logs/20260318_144413.log b/URSA/experiments/smoke/logs/20260318_144413.log
new file mode 100644
index 0000000000000000000000000000000000000000..864b05d4465f2d073c45b24522099ec5a7deeab8
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_144413.log
@@ -0,0 +1,79 @@
+03/18 14:44:13 INFO train_distill_dimo.py:793] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 14:44:13 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 14:45:21 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 14:45:26 INFO train_distill_dimo.py:242] [init] student params: 1982.17M
+03/18 14:45:26 INFO train_distill_dimo.py:245] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=1
+03/18 14:45:26 INFO train_distill_dimo.py:598] [train] Starting from step 0 / 50
+03/18 14:45:46 INFO train_distill_dimo.py:754] [assert] Step-1 shape/grad assertions PASSED ✓
+03/18 14:45:46 INFO train_distill_dimo.py:755] [assert] z_T_cond  shape=torch.Size([1, 5120, 64000])  min=-1.531  max=49.000
+03/18 14:45:46 INFO train_distill_dimo.py:760] [assert] z_S_cond  shape=torch.Size([1, 5120, 64000])  min=-1.531  max=49.000
diff --git a/URSA/experiments/smoke/logs/20260318_145041.log b/URSA/experiments/smoke/logs/20260318_145041.log
new file mode 100644
index 0000000000000000000000000000000000000000..29e8b898f0b7779ecf2e236ae31009c30abd8b50
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_145041.log
@@ -0,0 +1,76 @@
+03/18 14:50:41 INFO train_distill_dimo.py:793] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 14:50:41 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 14:51:50 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 14:51:55 INFO train_distill_dimo.py:242] [init] student params: 1982.17M
+03/18 14:51:55 INFO train_distill_dimo.py:245] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=1
+03/18 14:51:55 INFO train_distill_dimo.py:598] [train] Starting from step 0 / 50
diff --git a/URSA/experiments/smoke/logs/20260318_145414.log b/URSA/experiments/smoke/logs/20260318_145414.log
new file mode 100644
index 0000000000000000000000000000000000000000..0d113671046f906e3701124265c06226f44ff9e5
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_145414.log
@@ -0,0 +1,79 @@
+03/18 14:54:14 INFO train_distill_dimo.py:803] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 14:54:14 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 14:55:22 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 14:55:27 INFO train_distill_dimo.py:252] [init] student params: 1982.17M
+03/18 14:55:27 INFO train_distill_dimo.py:255] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=1
+03/18 14:55:27 INFO train_distill_dimo.py:608] [train] Starting from step 0 / 50
+03/18 14:55:49 INFO train_distill_dimo.py:764] [assert] Step-1 shape/grad assertions PASSED ✓
+03/18 14:55:49 INFO train_distill_dimo.py:765] [assert] z_T_cond  shape=torch.Size([1, 5120, 64000])  min=-1.531  max=49.000
+03/18 14:55:49 INFO train_distill_dimo.py:770] [assert] z_S_cond  shape=torch.Size([1, 5120, 64000])  min=-1.531  max=49.000
diff --git a/URSA/experiments/smoke/logs/20260318_145804.log b/URSA/experiments/smoke/logs/20260318_145804.log
new file mode 100644
index 0000000000000000000000000000000000000000..2132fadb7c611245d1652d5c7ec67f675b89b92e
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_145804.log
@@ -0,0 +1,91 @@
+03/18 14:58:04 INFO train_distill_dimo.py:803] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 14:58:04 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 14:59:20 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 14:59:38 INFO train_distill_dimo.py:252] [init] student params: 1982.17M
+03/18 14:59:38 INFO train_distill_dimo.py:255] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 14:59:38 INFO train_distill_dimo.py:608] [train] Starting from step 0 / 50
+03/18 15:00:20 INFO train_distill_dimo.py:764] [assert] Step-1 shape/grad assertions PASSED ✓
+03/18 15:00:20 INFO train_distill_dimo.py:765] [assert] z_T_cond  shape=torch.Size([1, 5120, 64000])  min=0.766  max=48.500
+03/18 15:00:20 INFO train_distill_dimo.py:770] [assert] z_S_cond  shape=torch.Size([1, 5120, 64000])  min=0.766  max=48.500
+03/18 15:03:04 INFO train_distill_dimo.py:690] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=3.10s
+03/18 15:03:04 INFO train_distill_dimo.py:701]     Train H_mean: 3.1840 (3.2345)
+03/18 15:03:04 INFO train_distill_dimo.py:701]     Train baseline_ema: -0.0013 (-0.0013)
+03/18 15:03:04 INFO train_distill_dimo.py:701]     Train loss_aux_cond: 0.0109 (0.0224)
+03/18 15:03:04 INFO train_distill_dimo.py:701]     Train loss_kd_cond: 0.0059 (0.0122)
+03/18 15:03:04 INFO train_distill_dimo.py:701]     Train loss_pg: -0.0138 (-0.0165)
+03/18 15:03:04 INFO train_distill_dimo.py:701]     Train mean_logp_tok: -3.1811 (-3.2334)
+03/18 15:03:04 INFO train_distill_dimo.py:701]     Train tok_entropy: 8.4540 (8.4533)
+03/18 15:03:04 INFO train_distill_dimo.py:701]     Train use_guided_ratio: 0.0000 (0.0200)
+03/18 15:03:32 INFO train_distill_dimo.py:676] [save] step=50 → ./experiments/smoke/checkpoints/checkpoint-50
+03/18 15:03:32 INFO train_distill_dimo.py:690] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=3.10s
+03/18 15:04:00 INFO train_distill_dimo.py:676] [save] step=50 → ./experiments/smoke/checkpoints/checkpoint-final
diff --git a/URSA/experiments/smoke/logs/20260318_151056.log b/URSA/experiments/smoke/logs/20260318_151056.log
new file mode 100644
index 0000000000000000000000000000000000000000..009b533e0bed02f189a431dcf2fd37307f4132d2
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_151056.log
@@ -0,0 +1,73 @@
+03/18 15:10:56 INFO train_distill_dimo.py:842] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:10:56 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 15:12:13 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
diff --git a/URSA/experiments/smoke/logs/20260318_151437.log b/URSA/experiments/smoke/logs/20260318_151437.log
new file mode 100644
index 0000000000000000000000000000000000000000..6886eb25716f98f07e57db355412c1a44066d51a
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_151437.log
@@ -0,0 +1,84 @@
+03/18 15:14:37 INFO train_distill_dimo.py:842] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:14:37 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 15:15:51 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 15:16:04 INFO train_distill_dimo.py:252] [init] student params: 1982.17M
+03/18 15:16:04 INFO train_distill_dimo.py:255] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 15:16:04 INFO train_distill_dimo.py:608] [train] Starting from step 0 / 50
+03/18 15:16:43 INFO train_distill_dimo.py:773] [assert] teacher/student param shared storage? False
+03/18 15:16:43 INFO train_distill_dimo.py:778] [assert] max|z_T - z_S| (pre-update logits) = 0.000e+00
+03/18 15:16:43 INFO train_distill_dimo.py:789] [assert] subT stats: mean=21.617 std=6.242 min=4.562 max=41.750 neg_frac=0.000
+03/18 15:16:43 INFO train_distill_dimo.py:793] [assert] subS stats: mean=21.617 std=6.242 min=4.562 max=41.750 neg_frac=0.000
+03/18 15:16:43 INFO train_distill_dimo.py:801] [assert] max|student_param - teacher_param| after step = 5.960e-08
+03/18 15:16:43 INFO train_distill_dimo.py:803] [assert] Step-1 shape/grad assertions PASSED ✓
+03/18 15:16:43 INFO train_distill_dimo.py:804] [assert] z_T_cond  shape=torch.Size([1, 5120, 64000])  min=0.766  max=48.500
+03/18 15:16:43 INFO train_distill_dimo.py:809] [assert] z_S_cond  shape=torch.Size([1, 5120, 64000])  min=0.766  max=48.500
diff --git a/URSA/experiments/smoke/logs/20260318_151750.log b/URSA/experiments/smoke/logs/20260318_151750.log
new file mode 100644
index 0000000000000000000000000000000000000000..de5e8e1c024d14fa270020eca57dcf483f998b0a
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_151750.log
@@ -0,0 +1,79 @@
+03/18 15:17:50 INFO train_distill_dimo.py:813] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:17:50 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 15:19:08 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 15:19:19 INFO train_distill_dimo.py:252] [init] student params: 1982.17M
+03/18 15:19:19 INFO train_distill_dimo.py:255] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 15:19:19 INFO train_distill_dimo.py:608] [train] Starting from step 0 / 50
+03/18 15:19:58 INFO train_distill_dimo.py:770] [assert] shared_storage=False
+03/18 15:19:58 INFO train_distill_dimo.py:776] [assert] param_delta_sample_max=0.000e+00
+03/18 15:19:58 INFO train_distill_dimo.py:784] [assert] logits_delta_sub_max=0.000e+00
diff --git a/URSA/experiments/smoke/logs/20260318_152541.log b/URSA/experiments/smoke/logs/20260318_152541.log
new file mode 100644
index 0000000000000000000000000000000000000000..40662771e1eb3a76fcaeed534ee8d955ec5619b1
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_152541.log
@@ -0,0 +1,72 @@
+03/18 15:25:41 INFO train_distill_dimo.py:827] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:25:41 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
diff --git a/URSA/experiments/smoke/logs/20260318_152657.log b/URSA/experiments/smoke/logs/20260318_152657.log
new file mode 100644
index 0000000000000000000000000000000000000000..12c35b918523642c067687448827d794648aded8
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_152657.log
@@ -0,0 +1,72 @@
+03/18 15:26:57 INFO train_distill_dimo.py:827] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:26:57 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
diff --git a/URSA/experiments/smoke/logs/20260318_152815.log b/URSA/experiments/smoke/logs/20260318_152815.log
new file mode 100644
index 0000000000000000000000000000000000000000..796b81bfd2ba431f775127a7399e3918030e7acf
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_152815.log
@@ -0,0 +1,79 @@
+03/18 15:28:15 INFO train_distill_dimo.py:827] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:28:15 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 15:29:21 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 15:29:30 INFO train_distill_dimo.py:252] [init] student params: 1982.17M
+03/18 15:29:30 INFO train_distill_dimo.py:255] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 15:29:30 INFO train_distill_dimo.py:608] [train] Starting from step 0 / 50
+03/18 15:30:10 INFO train_distill_dimo.py:788] [assert] Step-1 shape/grad assertions PASSED ✓
+03/18 15:30:10 INFO train_distill_dimo.py:789] [assert] z_T_cond  shape=torch.Size([1, 5120, 64000])  min=0.766  max=48.500
+03/18 15:30:10 INFO train_distill_dimo.py:794] [assert] z_S_cond  shape=torch.Size([1, 5120, 64000])  min=0.766  max=48.500
diff --git a/URSA/experiments/smoke/logs/20260318_153206.log b/URSA/experiments/smoke/logs/20260318_153206.log
new file mode 100644
index 0000000000000000000000000000000000000000..0a47b074e179474a7cc4b64df81db02116d0c13c
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_153206.log
@@ -0,0 +1,91 @@
+03/18 15:32:06 INFO train_distill_dimo.py:827] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:32:06 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 15:33:14 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 15:33:16 INFO train_distill_dimo.py:252] [init] student params: 1982.17M
+03/18 15:33:16 INFO train_distill_dimo.py:255] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=1
+03/18 15:33:16 INFO train_distill_dimo.py:608] [train] Starting from step 0 / 50
+03/18 15:33:38 INFO train_distill_dimo.py:788] [assert] Step-1 shape/grad assertions PASSED ✓
+03/18 15:33:38 INFO train_distill_dimo.py:789] [assert] z_T_cond  shape=torch.Size([1, 5120, 64000])  min=-1.531  max=49.000
+03/18 15:33:38 INFO train_distill_dimo.py:794] [assert] z_S_cond  shape=torch.Size([1, 5120, 64000])  min=-1.531  max=49.000
+03/18 15:36:09 INFO train_distill_dimo.py:690] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=3.05s
+03/18 15:36:09 INFO train_distill_dimo.py:701]     Train H_mean: 2.9448 (3.1767)
+03/18 15:36:09 INFO train_distill_dimo.py:701]     Train baseline_ema: -0.0013 (-0.0014)
+03/18 15:36:09 INFO train_distill_dimo.py:701]     Train loss_aux_cond: 0.0109 (0.0198)
+03/18 15:36:09 INFO train_distill_dimo.py:701]     Train loss_kd_cond: 0.0056 (0.0104)
+03/18 15:36:09 INFO train_distill_dimo.py:701]     Train loss_pg: -0.0125 (-0.0162)
+03/18 15:36:09 INFO train_distill_dimo.py:701]     Train mean_logp_tok: -2.9466 (-3.1757)
+03/18 15:36:09 INFO train_distill_dimo.py:701]     Train tok_entropy: 8.4532 (8.4525)
+03/18 15:36:09 INFO train_distill_dimo.py:701]     Train use_guided_ratio: 0.0000 (0.0200)
+03/18 15:36:39 INFO train_distill_dimo.py:676] [save] step=50 → ./experiments/smoke/checkpoints/checkpoint-50
+03/18 15:36:39 INFO train_distill_dimo.py:690] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=3.05s
+03/18 15:37:08 INFO train_distill_dimo.py:676] [save] step=50 → ./experiments/smoke/checkpoints/checkpoint-final
diff --git a/URSA/experiments/smoke/logs/20260318_154020.log b/URSA/experiments/smoke/logs/20260318_154020.log
new file mode 100644
index 0000000000000000000000000000000000000000..e94d9b8dd3b9091316246c71cf95ea3fdbe1a960
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_154020.log
@@ -0,0 +1,73 @@
+03/18 15:40:20 INFO train_distill_dimo.py:918] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:40:20 INFO train_distill_dimo.py:189] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 15:41:32 INFO train_distill_dimo.py:212] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
diff --git a/URSA/experiments/smoke/logs/20260318_154514.log b/URSA/experiments/smoke/logs/20260318_154514.log
new file mode 100644
index 0000000000000000000000000000000000000000..09999187d43fde8ec63dc7600976fb7a81e8b228
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_154514.log
@@ -0,0 +1,80 @@
+03/18 15:45:14 INFO train_distill_dimo.py:920] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:45:14 INFO train_distill_dimo.py:125] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 15:46:22 INFO train_distill_dimo.py:148] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 15:46:28 INFO train_distill_dimo.py:205] [assert] shared_storage=False
+03/18 15:46:29 INFO train_distill_dimo.py:260] [init] student params: 1982.17M
+03/18 15:46:29 INFO train_distill_dimo.py:263] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=1
+03/18 15:46:29 INFO train_distill_dimo.py:635] [train] Starting from step 0 / 50
+03/18 15:46:47 INFO train_distill_dimo.py:484] [assert] logits_delta_sub_max=0.000e+00
+03/18 15:46:48 INFO train_distill_dimo.py:563] [assert] param_delta_sample_max(student)=0.000e+00
+03/18 15:46:48 INFO train_distill_dimo.py:564] [assert] param_delta_sample_max(aux)=3.052e-05
diff --git a/URSA/experiments/smoke/logs/20260318_155341.log b/URSA/experiments/smoke/logs/20260318_155341.log
new file mode 100644
index 0000000000000000000000000000000000000000..cb781eef2bed25dad729c8d84029ac0947dea566
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_155341.log
@@ -0,0 +1,80 @@
+03/18 15:53:41 INFO train_distill_dimo.py:920] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:53:41 INFO train_distill_dimo.py:125] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 15:55:01 INFO train_distill_dimo.py:148] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 15:55:15 INFO train_distill_dimo.py:205] [assert] shared_storage=False
+03/18 15:55:15 INFO train_distill_dimo.py:260] [init] student params: 1982.17M
+03/18 15:55:15 INFO train_distill_dimo.py:263] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 15:55:15 INFO train_distill_dimo.py:635] [train] Starting from step 0 / 50
+03/18 15:55:51 INFO train_distill_dimo.py:484] [assert] logits_delta_sub_max=0.000e+00
+03/18 15:55:54 INFO train_distill_dimo.py:563] [assert] param_delta_sample_max(student)=3.906e-03
+03/18 15:55:54 INFO train_distill_dimo.py:564] [assert] param_delta_sample_max(aux)=3.052e-05
diff --git a/URSA/experiments/smoke/logs/20260318_155707.log b/URSA/experiments/smoke/logs/20260318_155707.log
new file mode 100644
index 0000000000000000000000000000000000000000..35a3f47ab26a61d89ab543017e8aebbe1877ed0c
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_155707.log
@@ -0,0 +1,83 @@
+03/18 15:57:07 INFO train_distill_dimo.py:920] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 0
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 15:57:07 INFO train_distill_dimo.py:125] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 15:58:28 INFO train_distill_dimo.py:148] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 15:58:43 INFO train_distill_dimo.py:205] [assert] shared_storage=False
+03/18 15:58:43 INFO train_distill_dimo.py:260] [init] student params: 1982.17M
+03/18 15:58:43 INFO train_distill_dimo.py:263] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 15:58:43 INFO train_distill_dimo.py:635] [train] Starting from step 0 / 50
+03/18 15:59:19 INFO train_distill_dimo.py:484] [assert] logits_delta_sub_max=0.000e+00
+03/18 15:59:22 INFO train_distill_dimo.py:563] [assert] param_delta_sample_max(student)=1.526e-05
+03/18 15:59:22 INFO train_distill_dimo.py:564] [assert] param_delta_sample_max(aux)=3.052e-05
+03/18 15:59:22 INFO train_distill_dimo.py:881] [assert] Step-1 shape/grad assertions PASSED ✓
+03/18 15:59:22 INFO train_distill_dimo.py:882] [assert] z_T_cond  shape=torch.Size([1, 5120, 64000])  min=-18.625  max=50.750
+03/18 15:59:22 INFO train_distill_dimo.py:887] [assert] z_S_cond  shape=torch.Size([1, 5120, 64000])  min=-18.625  max=50.750
diff --git a/URSA/experiments/smoke/logs/20260318_160334.log b/URSA/experiments/smoke/logs/20260318_160334.log
new file mode 100644
index 0000000000000000000000000000000000000000..bd295bb3dcfb6c0f7879dce90fd7a33c3d632e4e
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_160334.log
@@ -0,0 +1,76 @@
+03/18 16:03:34 INFO train_distill_dimo.py:834] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 0
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 16:03:34 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 16:04:56 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 16:05:08 INFO train_distill_dimo.py:252] [init] student params: 1982.17M
+03/18 16:05:08 INFO train_distill_dimo.py:255] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 16:05:08 INFO train_distill_dimo.py:615] [train] Starting from step 0 / 50
diff --git a/URSA/experiments/smoke/logs/20260318_160719.log b/URSA/experiments/smoke/logs/20260318_160719.log
new file mode 100644
index 0000000000000000000000000000000000000000..f9a45d94225e441a84a12130af13dd4e0204c205
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_160719.log
@@ -0,0 +1,79 @@
+03/18 16:07:19 INFO train_distill_dimo.py:834] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: 100,
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 2
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 16:07:19 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 16:08:42 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 16:09:01 INFO train_distill_dimo.py:252] [init] student params: 1982.17M
+03/18 16:09:01 INFO train_distill_dimo.py:255] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 16:09:01 INFO train_distill_dimo.py:615] [train] Starting from step 0 / 50
+03/18 16:09:40 INFO train_distill_dimo.py:795] [assert] Step-1 shape/grad assertions PASSED ✓
+03/18 16:09:40 INFO train_distill_dimo.py:796] [assert] z_T_cond  shape=torch.Size([1, 5120, 64000])  min=0.766  max=48.500
+03/18 16:09:40 INFO train_distill_dimo.py:801] [assert] z_S_cond  shape=torch.Size([1, 5120, 64000])  min=0.766  max=48.500
diff --git a/URSA/experiments/smoke/logs/20260318_161100.log b/URSA/experiments/smoke/logs/20260318_161100.log
new file mode 100644
index 0000000000000000000000000000000000000000..71191d69fea4842a1523e97f67c29a2e30158459
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_161100.log
@@ -0,0 +1,88 @@
+03/18 16:11:00 INFO train_distill_dimo.py:834] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 16:11:00 INFO train_distill_dimo.py:123] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 16:12:19 INFO train_distill_dimo.py:146] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 16:12:33 INFO train_distill_dimo.py:252] [init] student params: 1982.17M
+03/18 16:12:33 INFO train_distill_dimo.py:255] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 16:12:33 INFO train_distill_dimo.py:615] [train] Starting from step 0 / 50
+03/18 16:15:53 INFO train_distill_dimo.py:697] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=3.10s
+03/18 16:15:53 INFO train_distill_dimo.py:708]     Train H_mean: 3.1028 (3.2100)
+03/18 16:15:53 INFO train_distill_dimo.py:708]     Train baseline_ema: -0.0011 (-0.0011)
+03/18 16:15:53 INFO train_distill_dimo.py:708]     Train loss_aux_cond: 0.0109 (0.0370)
+03/18 16:15:53 INFO train_distill_dimo.py:708]     Train loss_kd_cond: 0.0057 (0.0200)
+03/18 16:15:53 INFO train_distill_dimo.py:708]     Train loss_pg: -0.0134 (-0.0141)
+03/18 16:15:53 INFO train_distill_dimo.py:708]     Train mean_logp_tok: -3.1060 (-3.2138)
+03/18 16:15:53 INFO train_distill_dimo.py:708]     Train tok_entropy: 8.4529 (8.4528)
+03/18 16:15:53 INFO train_distill_dimo.py:708]     Train use_guided_ratio: 0.0000 (0.0200)
+03/18 16:16:21 INFO train_distill_dimo.py:683] [save] step=50 → ./experiments/smoke/checkpoints/checkpoint-50
+03/18 16:16:21 INFO train_distill_dimo.py:697] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=3.10s
+03/18 16:16:48 INFO train_distill_dimo.py:683] [save] step=50 → ./experiments/smoke/checkpoints/checkpoint-final
diff --git a/URSA/experiments/smoke/logs/20260318_192154.log b/URSA/experiments/smoke/logs/20260318_192154.log
new file mode 100644
index 0000000000000000000000000000000000000000..a466104b0f6f709cf90913a74fbf3442a134818c
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_192154.log
@@ -0,0 +1,76 @@
+03/18 19:21:54 INFO train_distill_dimo.py:871] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 19:21:54 INFO train_distill_dimo.py:153] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 19:23:09 INFO train_distill_dimo.py:176] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 19:23:23 INFO train_distill_dimo.py:279] [init] student params: 1982.17M
+03/18 19:23:23 INFO train_distill_dimo.py:282] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 19:23:23 INFO train_distill_dimo.py:653] [train] Starting from step 0 / 50
diff --git a/URSA/experiments/smoke/logs/20260318_192913.log b/URSA/experiments/smoke/logs/20260318_192913.log
new file mode 100644
index 0000000000000000000000000000000000000000..4f84cf9ebb52da2598e53acb7445a119921eefa4
--- /dev/null
+++ b/URSA/experiments/smoke/logs/20260318_192913.log
@@ -0,0 +1,88 @@
+03/18 19:29:13 INFO train_distill_dimo.py:871] Config:
+experiment:
+  name: distill_dimo
+  output_dir: ./experiments/smoke
+  log_every: 50
+  save_every: 50
+  resume_iter: 0
+training:
+  seed: 42
+  mixed_precision: bf16
+  max_train_steps: 50
+  gradient_accumulation_steps: 1
+distill:
+  teacher_ckpt: /gfs/space/private/fengzl/World_Model/URSA-1.7B
+  prompt_source: /gfs/space/private/fengzl/World_Model/Koala-36M-v1
+  num_frames: 17
+  height: 256
+  width: 256
+  max_prompt_length: 320
+  batch_size_per_gpu: 1
+  lambda_kd: 0.5
+  lambda_pg: 1.0
+  lambda_ent: 0.01
+  tau: 1.0
+  tau_kd: 1.0
+  enable_teacher_cfg: true
+  teacher_cfg_scale: 3.0
+  teacher_cfg_prob: 1.0
+  teacher_cfg_warmup_steps: 2000
+  teacher_cfg_trunc: 0.9
+  lambda_kd_uncond: 0.3
+  reward_use_guided: false
+  fake_rounds: 1
+  use_surrogate_grad: false
+  lambda_surr: 1.0
+  t_curriculum_steps: 10000
+  p_init_mix_ratio: 0.2
+  p_mix_corrupt_frac: 0.2
+  collapse_warn_frac: 0.2
+  aux_noise_std: 1.0e-05
+  grad_clip: 1.0
+optimizer_student:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+optimizer_aux:
+  target: torch.optim.AdamW
+  params:
+    lr: 1.0e-05
+    betas:
+    - 0.9
+    - 0.95
+    weight_decay: 0.01
+lr_scheduler:
+  target: diffnext.engine.lr_scheduler.CosineLR
+  params:
+    lr_max: ${optimizer_student.params.lr}
+    lr_min: 1.0e-06
+    max_steps: ${training.max_train_steps}
+    warmup_steps: 500
+prompt_dataloader:
+  shuffle_files: true
+  shuffle_buffer: 50000
+  num_workers: 4
+  caption_field: caption
+config: ./configs/distill_dimo.yaml
+
+03/18 19:29:13 INFO train_distill_dimo.py:153] [init] Loading teacher from /gfs/space/private/fengzl/World_Model/URSA-1.7B ...
+03/18 19:30:32 INFO train_distill_dimo.py:176] [init] latents_shape=(5,32,32)  N=5120  K=64000  CFG=ON
+03/18 19:30:44 INFO train_distill_dimo.py:279] [init] student params: 1982.17M
+03/18 19:30:44 INFO train_distill_dimo.py:282] [init] max_train_steps=50  batch_size_per_gpu=1  num_processes=8
+03/18 19:30:44 INFO train_distill_dimo.py:653] [train] Starting from step 0 / 50
+03/18 19:34:12 INFO train_distill_dimo.py:734] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=3.63s
+03/18 19:34:12 INFO train_distill_dimo.py:745]     Train H_mean: 3.0307 (3.1918)
+03/18 19:34:12 INFO train_distill_dimo.py:745]     Train baseline_ema: -0.0012 (-0.0012)
+03/18 19:34:12 INFO train_distill_dimo.py:745]     Train loss_aux_cond: 0.0109 (0.0270)
+03/18 19:34:12 INFO train_distill_dimo.py:745]     Train loss_kd_cond: 0.0057 (0.0146)
+03/18 19:34:12 INFO train_distill_dimo.py:745]     Train loss_pg: -0.0140 (-0.0144)
+03/18 19:34:12 INFO train_distill_dimo.py:745]     Train mean_logp_tok: -3.0395 (-3.1907)
+03/18 19:34:12 INFO train_distill_dimo.py:745]     Train tok_entropy: 8.4536 (8.4540)
+03/18 19:34:12 INFO train_distill_dimo.py:745]     Train use_guided_ratio: 0.0000 (0.0200)
+03/18 19:34:42 INFO train_distill_dimo.py:720] [save] step=50 → ./experiments/smoke/checkpoints/checkpoint-50
+03/18 19:34:42 INFO train_distill_dimo.py:734] Iteration 50, lr_s=1.01e-06 lr_a=1.01e-06, time=3.63s
+03/18 19:35:11 INFO train_distill_dimo.py:720] [save] step=50 → ./experiments/smoke/checkpoints/checkpoint-final