Recreating Results of Paper

#6
by amon05U - opened

Hi Folks, thanks for the interesting work. I am trying to understand/regenerate the results of the paper. I have a few queries as below. It would be of great help if you can please help me address this:  

  1.  For Fig.3 - As mentioned in Section 8A: Downstream Task Evaluation, I am not able to find where the res1dcnn.py is used (in the code provided)? 
  2. Referring to step 9 in the Hugging Face wiki, once we get the dataset after performing inference, do we use it for anything else? If so, what do we use it for?
  3. Referring to step 10 in the Hugging Face wiki, how can we generate labels using raw or embedded data? Right now, we always get the same labels (for 64beams) whether we use raw data or embedded data. Am I missing/doing anything wrong here?
  4. We see the F1 score in input_preprocess.py under the method label_gen(). In this context: This is different than the F1-score between predicted and ground label, is my understanding correct? If so, why is this F1 score containing complex values?

Thank you!

Hello @amon05U ,

Thank you for your interest and insightful questions. Please find our clarifications below:

  1. The res1dcnn.py file is provided to show the downstream model used for beam prediction. You can directly utilize it for the beam prediction task with raw or embedded channels and their corresponding labels.
  2. The inference step in the Hugging Face page provides embeddings that can be used to improve downstream tasks, especially with limited data. In real-world wireless communication, labeled data is often scarce, making it difficult to train models effectively. LWM has been pre-trained on diverse datasets across different environments (rural, downtown, urban, etc.), enabling it to learn intrinsic wireless channel patterns. If you apply LWM to a dataset with limited labeled samples, it extracts refined representations that can improve downstream task performance. After performing inference on the original raw channel dataset, you obtain equivalent embeddings for each channel, which can be used for downstream tasks with better efficiency and generalization.
  3. The labels remain the same whether using raw or embedded data since embeddings are a refined representation of the original data. The transformation into embeddings does not alter the labels but helps the downstream model learn patterns more effectively, making it easier to distinguish relevant features.
  4. The "F1" variable in label_gen() does not represent an F1-score but rather a beamforming codebook. This codebook is used to assign beams to channels, not to measure model performance. The confusion arises from the naming, but it does not relate to the F1-score typically used for classification evaluation. Additionally, the presence of complex values is due to its role in beamforming computations rather than classification metrics.

We will release a new version of LWM next week, making downstream model training much easier. We will also share videos that address your questions.

We appreciate your careful inspection of the code and valuable feedback! Let us know if anything needs further clarification.

Thanks for your detailed response. Much appreciated. Just a quick question:

I looked at the "tutorial.py" file. It seems f1_scores in it represents the F1-score used in ML classification. How is this F1-score related to the one you are mentioning in [Point 4] above?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment