I am getting the CUDA OUT-Of-Memory on Tesla T4 but only with these prompts...

#5
by AayushShah - opened

Hello, I need a little help regarding the CUDA memory error.
I have tried to find the solution... but I think it is due to the limited available GPU RAM. Actually, in the generative QA, I am using this model and for the instruction tasks it works AMAZING!

Since it is about 12 GB in size, it fits in the GPU and can work on smaller tasks, such as identify a sentiment or write an SQL query. But when used for the question answering from the documents, prompt gets larger it gives this error:

OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 14.62 GiB total capacity; 13.28 GiB 
already allocated; 15.94 MiB free; 13.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated 
memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and 
PYTORCH_CUDA_ALLOC_CONF

Well, I was able to generate the response once for a prompt, for example:

Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."

Context:
* World Heritage Convention...
* Former UNESCO World Heritage...
* Natural heritage Legal...
* List of World Heritage Sites...
* World Heritage Committee Summary...

Question: What are the minimum criteria to be declared as the world heritage site?
Aanswer: The minimum criteria to be declared as a World Heritage Site is that the site must meet the following criteria: 1) be of outstanding universal value; 2) be of outstanding cultural or natural importance; 3) be located in a region of outstanding natural beauty; 4) be well-preserved and properly managed; and 5) be accessible to the public. Additionally, the site must also pass a subjective test, in which the World Heritage Committee evaluates the importance of the site in comparison to other sites around the world.

It delivered really quality response, but then trying it again gives the error.

I've tried clearing the cache, but didn't help. Any hope?

NLP Cloud org

Great to see it is working well for you!
Yes unfortunately the above is expected. GPT-J fits in 12GB GPU but it is borderline... So if you try to generate too many tokens, or if your input is too large, you will quickly run into OOM errors.
If you really want to use the full GPT-J 2048 tokens context, I recommend that you use a GPU with at least 20GB of VRAM.
Good luck with that!

Can't we clean the cache or clean up the used memory? Like I can observe that in the first run (a fresh start - when the model is loaded) the GPU is used by some XXGB. But after a couple of prompts, the usage rises. Does this happen? Can't we clean that last used memory to make it available for the next prompt?

Sorry for my confusing words, all I mean is, suppose:
- I loaded the model
- GPU is allocated around 10GB (say)
- I run: "Translate this to this"
- GPU is now allocated 11GB
- Then I run "Answer my question based on the context"
- Possibly runs, fine.
- GPU is now allocated 13GB.
- Then I run "Answer my question for new context"
ERROR.

So the thing is the space is being utilized after each prompt. Is that freeable? Or I am missing something?
And yes, I am not loading the model multiple times so that is not a possibility of the increasing usage.

Thanks @juliensalinas for your response πŸ€—

NLP Cloud org

Yes you're absolutely right, you might see the VRAM usage increase, but this is only because the GPU is caching some stuffs. You can still use the VRAM even if some things are cached. In theory you can still make the same request a second time without running OOM.
If you are making the same requests several times and at some point you get an OOM, it means that there's a problem indeed...

Hmm, I think GPU with higher VRAM is the solution as you mentioned around 20GB. Alright, I think I should opt for that instead.
And please don't mind me asking questions, I wonder about querying the structured data.

Like:
I have a data in two forms:

  1. Tabular data
  2. Textual (unstructured) data.

Suppose a scenario where I have 5 files.
1. A CSV with all information about the COVID-19 cases in the world
2, 3, 4, & 5 are the unstructured (text) files (Wikipedia pages) on the Olympics 2020.

Now, if I ask something like: "What was the reason behind postponing the Olympics 2020"?
Then the model should be able to answer from all data given to it also from the table. For:

In 2020 the world was suffering from the covid-19 and also Japan, where the Olympics were supposed to be held. At that time, Japan had around 2,000 daily cases and to prevent the spread, the Olympics was postponed.

(Of course, the response is made up) But as we can see that the model is able to pick up the number of active cases "2,000" from the csv and is able to integrate with the rest of the story from the text data (assuming that number wasn't in the text data).

Or say in a second use case, I give my sales data as a CSV and then ask something like: "How are my sales performing in the last 12 months, and which are the significant factors for it?" then we can imagine that the model should answer some trend, some figures, etc analyzing the data.

The question is, is it possible?
And if yes then how can we achieve such a generative response from the model? What could be the pipeline? And whether it is possible with open-source frameworks such as Haystack or langchain?

Please help. Thanks πŸ™

re: OOM, i am not sure if using methods like deepspeed mii or int8 would help.

are you running this on an EC2? if so, what instance size?

Hello, @silvacarl
I am using AWS SageMaker ml.g4dn.2xlarge notebook instance. It has 32GB memory and Tesla T4 GPU.

i have not used sagemaker so i canot say anything about that. but you should be able to run GPT-J-6B size models easily on the g4dn.2xlarge, since we have tested that type of configuration.

the g4dn instance type actually is useable except the smallest one.

but we set up ubuntu 20.04 and then RDP in to our EC2 so we do not use any of the extra overhead of sagemaker or a notebook, so its possible that those two extra pieces use just enough extra resources to cause a problem.

this problem will not go away on any of the g4dn instance types if you are having it, since all instance types no matter what size on EC2s if they are in the same family of instance type, they use the same GPUs.

you should also check out a g5.xlarge.

if you are willing to try other hosted providers, check out lambda or paperspace or coreweave, they all work well as well.

at the end of the day, depending on yoru call volume, it could be much easier to leverage what nlpcloud's API has in place if you can use one of their hosted models, since you will not have to worry about these issues.

Thanks for your respose @silvacarl ,
I think now I see a clearer picture. I need a higher-end GPU with more VRAM.

Now I am thinking of "fine-tuning" the model. Please don't laugh, because I am not even able to load the model for the inference, how can I get into the fine tuning! I can understand, but I think I should fine tune a dataset on my own documents, so at the inference time, I just have to pass the "question" in the prompt and nothing else.

So, the model can answer from its own knowledge.


I have followed the PEFT with LoRA method from video-link-1 & video-link-2 and I think after freezing 99% of the layers, I should be able to adapt tune the GPT-J on my own data. And if not, I should use a smaller model.

My question is, how should I pass the prompt?
I mean, during the training, how should I structure my prompt?

Should I just give the raw text as-is?
or
I should do some prompt engineering like: Context:{} Question:{} Answer:{} to the model?

Will you please shade some light on this?
And please pardon my noob questions πŸ˜“

Thanks for your help mate! πŸ™

you can easily fien tune this. https://github.com/mallorbc/Finetune_LLMs

or if you want an easy way to do it, use the nlpcloud fine tuning plan.

Hello there! @juliensalinas how are you doing!

As you suggested:

Yes unfortunately the above is expected. GPT-J fits in 12GB GPU but it is borderline... So if you try to generate too many tokens, or if your input is too large, you will quickly run into OOM errors.
If you really want to use the full GPT-J 2048 tokens context, I recommend that you use a GPU with at least 20GB of VRAM.

I have tried this model on the 46GiB vRAM GPU (A6000) but guess what? It gives OOM error.

CUDA out of memory. Tried to allocate 240.00 MiB (GPU 0; 47.54 GiB total capacity; 45.97 GiB already allocated; 223.12 MiB free; 46.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I mean, I used more than double vRAM for the model. Yes, my context length is huge like 1500 ~ 1700 tokens, but as you've suggested, it should not happen, right?


I am using this code:

gc.collect()
torch.cuda.empty_cache()

# expect the prompt to be 1500 ~ 1700 tokens
batch = tokenizer(prompt, return_tensors='pt').to(device)

output_tokens = model.generate(**batch, 
                                 temperature=0.9, 
                                 min_length=15,
                                 early_stopping=True,
                                 num_beams=4,
                                 no_repeat_ngram_size=2,
                                 top_k=40, top_p=0.7,
                                 max_new_tokens=200,
                                 penalty_alpha=0.6,
                                 use_cache=False,
                                 pad_token_id=tokenizer.eos_token_id)

As raised earlier, the model can run for the first time or two but after then (even after clearing the cache) I am getting the OOM error.
How big my GPU should be?
Please help, thanks πŸ™

NLP Cloud org

Hello @AayushShah ,
This is really surprising, in theory you should not get any problem running this model with so much VRAM...
Did you try with the code I put in the README? :

from transformers import pipeline
import torch

generator = pipeline(model="nlpcloud/instruct-gpt-j-fp16", torch_dtype=torch.float16, device=0)

prompt = "Correct spelling and grammar from the following text.\nI do not wan to go\n"

print(generator(prompt))

Hei @juliensalinas , πŸ€—
Yes! Indeed, I can run the model on the sample tasks given in the README and also other small prompt tasks such as SQL generation and some few-shots, but as said somehow it starts giving the OOM errors on the bigger prompts (not too much, just around ~1500 to ~1900 tokens max.

I am sharing the same script I am using for you:

from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("nlpcloud/instruct-gpt-j-fp16", 
                                             torch_dtype=torch.float16,
                                             load_in_8bit=True, 
                                             low_cpu_mem_usage=True,
                                             device_map='auto')

tokenizer = AutoTokenizer.from_pretrained("nlpcloud/instruct-gpt-j-fp16")

Then I have a retriever which fetches top 3 documents from the document store and after checking the length an all other processing, I create the prompt like:

Prompt-1:

Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."

Context:
* 2020 Summer Olympics Sports  The event program for the 2020 Summer Olympics was approved by the IOC executive board on 9 June 2017. IOC president Thomas Bach stated that their goal was to give the Games "youthful" and "urban" appeal, and to increase the number of female participants.The Games featured 339 events in 33 different sports, encompassing a total of 50 disciplines. Karate, sport climbing, surfing, and skateboarding made their Olympic debut, while baseball and softball also made a one-off return to the Summer Olympics for the first time since 2008. 15 new events within existing sports were also added, including 3Γ—3 basketball, freestyle BMX, and the return of madison cycling, as well as 9 new mixed events in several sports (table tennis, archery, judo, shooting (3), triathlon, 4 Γ— 400 m relay running and 4 Γ— 100 m medley swimming).In the list below, the number of events in each discipline is noted in parentheses.
* 2020 Summer Olympics medal table Summary  The 2020 Summer Olympics, officially known as the Games of the XXXII Olympiad, was an international multi-sport event held in Tokyo, Japan, from 23 July to 8 August 2021. The games were postponed by one year as part of the impact of the COVID-19 pandemic on sports. However, the Games was referred to by its original date in all medals, uniforms, promotional items, and other related media in order to avoid confusion in future years. A total of 11,417 athletes from 206 nations participated in 339 events in 33 sports across 50 different disciplines.Overall, the event saw two records: 93 nations received at least one medal, and 65 of them won at least one gold medal. Athletes from the United States won the most medals overall, with 113, and the most gold medals, with 39. Host nation Japan won 27 gold medals surpassing its gold medal tally of 16 at both the 1964 and 2004 summer editions. Athletes from that nation also won 58 medals overall, which eclipsed its record of 41 overall medals won at the previous Summer Olympics.American swimmer Caeleb Dressel won the most gold medals at the games with five. Meanwhile, Australian swimmer Emma McKeon won the greatest number of medals overall, with seven in total. As a result, she tied Soviet gymnast Maria Gorokhovskaya's seven medals at the 1952 summer edition for most medals won at a single games by a female athlete. Bermuda, Qatar, and the Philippines won their nation's first Olympic gold medals. Meanwhile, Burkina Faso, Turkmenistan, and San Marino won their nation's first Olympic medals. However, Turkmenistani athletes had previously competed as nationals of the Russian Empire and of the Soviet Union, in particular during the 1992 Summer Olympics, Turkmenistani athletes competed as part of the Unified Team.
* 2020 Summer Olympics New sports  On 12 February 2013, with a remit to control the cost of the Games and ensure they are "relevant to sports fans of all generations", the IOC Executive Board recommended the removal of one of the 26 sports contested at the 2012 Summer Olympics, leaving a vacancy which the IOC would seek to fill at the 125th IOC Session. The new entrant would join golf and rugby sevens (which would both debut in 2016) as part of the program of 28 "core" sports. Five sports were shortlisted for removal, including canoe, field hockey, modern pentathlon, taekwondo, and wrestling. In the final round of voting by the executive board, eight members voted to remove wrestling from the Olympic program. Hockey and taekwondo were both tied in second with three votes each.The decision to drop wrestling surprised many media outlets, given that the sport's role in the Olympics dates back to the ancient Olympic Games, and was included in the original program for the modern Games. The New York Times felt that the decision was based on the shortage of well-known talent and the absence of women's events in the sport. Out of the shortlist from the IOC vote, Wrestling was duly added to the shortlist of applicants for inclusion in the 2020 Games, alongside the seven new sports that were put forward for consideration.On 29 May 2013, it was announced that three of the eight sports under consideration had made the final shortlist: baseball/softball, squash and wrestling. The other five sports were rejected at this point: karate, roller sports, sport climbing, wakeboarding, and wushu. At the 125th IOC Session on 8 September 2013, wrestling was chosen to be included in the Olympic program for 2020 and 2024. Wrestling secured 49 votes, while baseball/softball and squash received 24 votes and 22 votes respectively.With the adoption of the Olympic Agenda 2020 in December 2014, the IOC shifted from a "sport-based" approach to the Olympic program to an "event-based" programβ€”establishing that organizing committees may propose discretionary events to be included in the program to improve local interest. As a result of these changes, a shortlist of eight new proposed sports was unveiled on 22 June 2015, consisting of baseball/softball, bowling, karate, roller sports, sport climbing, squash, surfing, and wushu. On 28 September 2015, the Tokyo Organizing Committee submitted their shortlist of five proposed sports to the IOC: baseball/softball, karate, sport climbing, surfing, and skateboarding. These five new sports were approved on 3 August 2016 by the IOC during the 129th IOC Session in Rio de Janeiro, Brazil, and were included in the sports program for 2020 only, bringing the total number of sports at the 2020 Olympics to 33.

Question: How many new events were introduced for the 2020 Summer Olympics?
Aanswer:

Prompt-2:

Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."

Context:
* World Heritage Convention Summary  The World Heritage Convention, formally the Convention Concerning the Protection of the World Cultural and Natural Heritage, is an international treaty signed on 23 November 1972, which created the World Heritage Sites, with the primary goals of nature conservation and the preservation of cultural properties. The convention, a signed document of international agreement, guides the work of the World Heritage Committee. It was developed over a seven-year period (1965–1972). The convention defines which sites which can be considered for inscription on the World Heritage List, sets out the duties of each country's governments to identify potential sites and to protect and preserve them. Signatory countries pledge to conserve the World Heritage sites situated on their territory, and report regularly on the state of their conservation. The convention also sets out how the World Heritage Fund is to be used and managed.It was adopted by the General Conference of UNESCO on 16 November 1972, and signed by the President of General Conference of UNESCO, Toru Haguiwara, and the Director-General of UNESCO, RenΓ© Maheu, on 23 November 1972. It is held in the archives of UNESCO.
* Former UNESCO World Heritage Sites Summary  World Heritage Sites may lose their designation when the UNESCO World Heritage Committee determines that they are not properly managed or protected. The committee can place a site it is concerned about on its list of World Heritage in Danger of losing its designation, and attempts to negotiate with the local authorities to remedy the situation. If remediation fails, the committee then revokes its designation. A country may also request to reduce the boundaries of one of its existing sites, in effect partially or fully delisting such properties. Under the World Heritage guidelines, a country must report to the committee whenever one of its properties "inscribed on the World Heritage List has seriously deteriorated, or when the necessary corrective measures have not been taken."Three sites have been completely delisted from the World Heritage List: the Arabian Oryx Sanctuary in Oman, the Dresden Elbe Valley in Germany and Liverpool Maritime Mercantile City in the United Kingdom.
* Natural heritage Legal status  An important site of natural heritage or cultural heritage can be listed as a World Heritage Site by the World Heritage Committee of UNESCO.  The UNESCO programme, catalogues, names, and conserves sites of outstanding cultural or natural importance to the common heritage of humanity.  As of March 2012, there are 936 World Heritage Sites: 725 cultural, 183 natural, and 28 mixed properties, in 153 countries. The 1972 UNESCO World Heritage Convention established that biological resources, such as plants, were the common heritage of mankind or as was expressed in the preamble: "need to be preserved as part of the world heritage of mankind as a whole". These rules probably inspired the creation of great public banks of genetic resources, located outside the source-countries. New global agreements (e.g., the Convention on Biological Diversity), national rights over biological resources (not property). The idea of static  conservation of biodiversity is disappearing and being replaced by the idea of dynamic conservation, through the notion of resource and  innovation. The new agreements commit countries to conserve biodiversity, develop resources for sustainability and share the benefits resulting from their use. Under new rules, it is expected  that bioprospecting or collection of natural products has to be allowed by  the biodiversity-rich country, in exchange for a share of the  benefits. In 2005, the World Heritage Marine Programme was established to protect marine areas with Outstanding Universal Values.
* List of World Heritage Sites in Western Asia Legend  Site; named after the World Heritage Committee's official designation Location; at city, regional, or provincial level and geocoordinates Criteria; as defined by the World Heritage Committee Area; in hectares and acres. If available, the size of the buffer zone has been noted as well. A value of zero implies that no data has been published by UNESCO Year; during which the site was inscribed to the World Heritage List Description; brief information about the site, including reasons for qualifying as an endangered site, if applicable
* World Heritage Committee Summary  The World Heritage Committee is a committee of the United Nations Educational, Scientific and Cultural Organization that selects the sites to be listed as UNESCO World Heritage Sites, including the World Heritage List and the List of World Heritage in Danger, defines the use of the World Heritage Fund and allocates financial assistance upon requests from States Parties. It comprises representatives from 21 state parties that are elected by the General Assembly of States Parties for a four-year term. These parties vote on decisions and proposals related to the World Heritage Convention and World Heritage List.  According to the World Heritage Convention, a committee member's term of office is six years. However many States Parties choose to voluntarily limit their term to four years, in order to give other States Parties an opportunity to serve.  All members elected at the 15th General Assembly (2005) voluntarily chose to reduce their term of office from six to four years.Deliberations of the World Heritage Committee are aided by three advisory bodies, the IUCN, ICOMOS and ICCROM.

Question: What are the minimum criteria to be declared as the world heritage site?
Aanswer:

Prompt-3:

Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."

Context:
* Russo-Ukrainian War August 2014 Russian invasion  After a series of military defeats and setbacks for the separatists, who united under the banner of "Novorossiya", Russia dispatched what it called a "humanitarian convoy" of trucks across the border in mid-August 2014. Ukraine called the move a "direct invasion". Ukraine's National Security and Defence Council reported that convoys were arriving almost daily in November (up to 9 convoys on 30 November) and that their contents were mainly arms and ammunition. Strelkov claimed that in early August, Russian servicemen, supposedly on "vacation" from the army, began to arrive in Donbas.By August 2014, the Ukrainian "Anti-Terrorist Operation" shrank the territory under pro-Russian control, and approached the border. Igor Girkin urged Russian military intervention, and said that the combat inexperience of his irregular forces, along with recruitment difficulties amongst the local population, had caused the setbacks. He stated, "Losing this war on the territory that President Vladimir Putin personally named New Russia would threaten the Kremlin's power and, personally, the power of the president".In response to the deteriorating situation, Russia abandoned its hybrid approach, and began a conventional invasion on 25 August 2014. On the following day, the Russian Defence Ministry said these soldiers had crossed the border "by accident". According to Nikolai Mitrokhin's estimates, by mid-August 2014 during the Battle of Ilovaisk, between 20,000 and 25,000 troops were fighting in the Donbas on the separatist side, and only 40-45% were "locals".On 24 August 2014, Amvrosiivka was occupied by Russian paratroopers, supported by 250 armoured vehicles and artillery pieces. The same day, Ukrainian President Petro Poroshenko referred to the operation as Ukraine's "Patriotic War of 2014" and a war against external aggression. On 25 August, a column of Russian military vehicles was reported to have crossed into Ukraine near Novoazovsk on the Azov sea coast. It appeared headed towards Ukrainian-held Mariupol, in an area that had not seen pro-Russian presence for weeks. Russian forces captured Novoazovsk. and Russian soldiers began deporting Ukrainians who did not have an address registered within the town. Pro-Ukrainian anti-war protests took place in Mariupol. The UN Security Council called an emergency meeting.  The Pskov-based 76th Guards Air Assault Division allegedly entered Ukrainian territory in August and engaged in a skirmish near Luhansk, suffering 80 dead. The Ukrainian Defence Ministry said that they had seized two of the unit's armoured vehicles near Luhansk, and reported destroying another three tanks and two armoured vehicles in other regions. The Russian government denied the skirmish took place, but on 18 August, the 76th was awarded the Order of Suvorov, one of Russia's highest awards, by Russian minister of defence Sergey Shoigu for the "successful completion of military missions" and "courage and heroism".The speaker of Russia's upper house of parliament and Russian state television channels acknowledged that Russian soldiers entered Ukraine, but referred to them as "volunteers". A reporter for Novaya Gazeta, an opposition newspaper in Russia, stated that the Russian military leadership paid soldiers to resign their commissions and fight in Ukraine in the early summer of 2014, and then began ordering soldiers into Ukraine. Russian opposition MP Lev Shlosberg made similar statements, although he said combatants from his country are "regular Russian troops", disguised as units of the DPR and LPR.In early September 2014, Russian state-owned television channels reported on the funerals of Russian soldiers who had died in Ukraine, but described them as "volunteers" fighting for the "Russian world". Valentina Matviyenko, a top United Russia politician, also praised "volunteers" fighting in "our fraternal nation". Russian state television for the first time showed the funeral of a soldier killed fighting in Ukraine.
* Russo-Ukrainian War Full-scale Russian invasion of Ukraine (2022–2023)  The 2022 Russian invasion of Ukraine began on the morning of 24 February, when Putin announced a "special military operation" to "demilitarise and denazify" Ukraine. Minutes later, missiles and airstrikes hit across Ukraine, including Kyiv, shortly followed by a large ground invasion along multiple fronts. Zelenskyy declared martial law and a general mobilisation of all male Ukrainian citizens between 18 and 60, who were banned from leaving the country.Russian attacks were initially launched on a northern front from Belarus towards Kyiv, a north-eastern front towards Kharkiv, a southern front from Crimea, and a south-eastern front from Luhansk and Donetsk. In the northern front, amidst heavy losses and strong Ukrainian resistance surrounding Kyiv, Russia's advance stalled in March, and by April its troops retreated. On 8 April, Russia placed its forces in southern and eastern Ukraine under the command of General Aleksandr Dvornikov, and some units withdrawn from the north were redeployed to the Donbas. On 19 April, Russia launched a renewed attack across a 500 kilometres (300 mi) long front extending from Kharkiv to Donetsk and Luhansk. By 13 May, a Ukraine counter-offensive had driven back Russian forces near Kharkiv. By 20 May, Mariupol fell to Russian troops following a prolonged siege of the Azovstal steel works. Russian forces continued to bomb both military and civilian targets far from the frontline. The war caused the largest refugee and humanitarian crisis within Europe since the Yugoslav Wars in the 1990s; the UN described it as the fastest-growing such crisis since World War II. In the first week of the invasion, the UN reported over a million refugees had fled Ukraine; this subsequently rose to over 7,405,590 by 24 September, a reduction from over eight million due to some refugees' return.Ukrainian forces launched counteroffensives in the south in August, and in the northeast in September. On 30 September, Russia annexed four oblasts of Ukraine which it had partially conquered during the invasion. This annexation was generally unrecognized and condemned by the countries of the world. After Putin announced that he would begin conscription drawn from the 300,000 citizens with military training and potentially the pool of about 25 million Russians who could be eligible for conscription, one-way tickets out of the country nearly or completely sold out. The Ukrainian offensive in the northeast successfully recaptured the majority of Kharkiv Oblast in September. In the course of the southern counteroffensive, Ukraine retook the city of Kherson in November and Russian forces withdrew to the east bank of the Dnieper River.The invasion was internationally condemned as a war of aggression. A United Nations General Assembly resolution demanded a full withdrawal of Russian forces, the International Court of Justice ordered Russia to suspend military operations and the Council of Europe expelled Russia. Many countries imposed new sanctions, which affected the economies of Russia and the world, and provided humanitarian and military aid to Ukraine. In September 2022, Putin signed a law that would punish anyone who resists conscription with a 10-year prison sentence resulting in an international push to allow asylum for Russians fleeing conscription.According to The New York Times, as of February 2023, the "number of Russian troops killed and wounded in Ukraine is approaching 200,000."

Question: What is the impact of the russian-ukraine war on the summer olympics 2020?
Aanswer:

The answer generation

gc.collect()
torch.cuda.empty_cache()

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generated_ids = model.generate(input_ids, 
                               do_sample=True, 
                               temperature=0.9, 
                               max_new_tokens=200,
                               num_beams=4,
                               top_k=4)
generated_text = tokenizer.decode(generated_ids[0])
print(generated_text)

NOTE: Please copy and paste the prompt-1, 2, 3 and test. After the first prompt, the generation should give you the OOM error.

I know, I am asking a lot and I feel shame about this,
Please help.
Thank you mate!

are you on EC2? if so, what instance size/type?

Hello @silvacarl ,
I am using:

For 24GB configuration

  • 1 x RTX 3090 β€” 24 GB VRAM
  • 9 vCPU AMD β€” 35 GB RAM

For 48GB configuration

  • 1 x RTX A6000 β€” 48 GB VRAM
  • 9 vCPU AMD β€” 71 GB RAM

As said in my question, I have even tried on the 48 GB configuration, but still getting the OOM error. And now, I am renting the GPUs from runpod.io instead of AWS because of the ease of access and lower cost. So, yeah I am no longer on EC2 or SageMaker.

Thanks πŸ™

you shouldnt get this. g4dn2.xlarge EC2 can run it, i have done it. RTX 3090 has 24 Gb and RTX A6000 has 49 Gb. something else has to be using the GPU memory.

Hello @silvacarl ,
I know I am being too much irritating for this silly issue, but looking for help if possible.
Here I am sharing a small (< 1min) video walkthrough of the "exact" steps I follow to get the error.

Video: https://drive.google.com/file/d/1ZPk9KzR-1l0nIkB69N1ERC2dut-ozg5S/view?usp=share_link


Unfortunately, the error appeared in the first generation run itself, otherwise, I could have shown that the generation works fine for prompt1, prompt2 ... but after a series of past generations the next generation causes the error, but couldn't show that in the video because for some reason it happened in the first run itself.

I am renting the GPUs from runpod.io instead of AWS because of the ease of access and lower cost

i have never heard of these guys so i cannot comment. Use AWS, or Coreweave, or Paperspace, or Lambda. they all work.

Thanks for your response @silvacarl !
I think I have found a solution (for now). Actually, I am using the following configurations for generation:

temperature=0.9, 
min_length=15,
early_stopping=True,
num_beams=4,
no_repeat_ngram_size=2,
top_k=40, 
top_p=0.7,
max_new_tokens=200,
penalty_alpha=0.6, # This is the problem
use_cache=False,
pad_token_id=tokenizer.eos_token_id)

I have tried disabling all parameters and started testing generation by enabling one-by-one. And after so many trials I have confirmed that penalty_alpha is using more resources when enabled. To confirm this I tried disabling it only and ran the generation without it, then in the next run, I enabled it and started getting OOM errors.

Similarly when num_beams is increased say 8-10, also causes the OOM errors. Because from my understanding the num_beams it keeps track of n possible sequences (from source). So, raising the number has the potential to cause an error.


For now, I have settled by setting num_beams=3-6 and other are variable.
Thanks for both of your constant support and a loooong thread here πŸ€—

Sign up or log in to comment