Spaces:

FoodDesert
/

Prompt_Squirrel

Running

App Files Files Community

FoodDesert commited on Apr 4, 2024

Commit

5232b3e

verified ·

1 Parent(s): b754714

Upload app.py

Browse files

Fixed an issue where artist suggestions were not reflecting tags containing escape characters.

Files changed (1) hide show

app.py +9 -5

app.py CHANGED Viewed

@@ -34,7 +34,7 @@ Some models react best when prompted with verbose scene descriptions akin to DAL
 This tool serves as a linguistic bridge to the e621 image board tag lexicon, on which many popular models such as Fluffyrock, Fluffusion, and Pony Diffusion v6 were trained.
 When you enter a txt2img prompt and press the "submit" button, the Tagset Completer parses your prompt and checks that all your tags are valid e621 tags.
-If it finds any that are not, it recommends some valid e621 tags you can use to replace them in the "Unseen Tags" table.
 Additionally, in the "Top Artists" text box, it lists the artists who would most likely draw an image having the set of tags you provided.
 This is useful to align your prompt with the expected input to an e621-trained model.
@@ -52,7 +52,7 @@ Yes, but only '(' and ')' and numerical weights, and all of these things are ign
 An example that illustrates acceptable parentheses and weight formatting is:
 ((sunset over the mountains)), (clear sky:1.5), ((eagle flying high:2.0)), river, (fish swimming in the river:1.2), (campfire, (marshmallows:2.1):1.3), stars in the sky, ((full moon:1.8)), (wolf howling:1.7)
-## Why are some valid tags marked as "unseen", and why don't some artists ever get returned?
 Some data is excluded from consideration if it did not occur frequently enough in the sample from which the application makes its calculations.
 If an artist or tag is too infrequent, we might not think we have enough data to make predictions about it.
@@ -479,6 +479,7 @@ def build_tag_offsets_dicts(new_image_tags_with_positions):
     for tag_text, start_pos in new_image_tags_with_positions:
         # Modify the tag
         modified_tag = tag_text.replace('_', ' ').replace('\\(', '(').replace('\\)', ')').strip()
         # Calculate the end position based on the original tag length
         end_pos = start_pos + len(tag_text)
         # Append the structured data for each tag
@@ -486,7 +487,8 @@ def build_tag_offsets_dicts(new_image_tags_with_positions):
             "original_tag": tag_text,
             "start_pos": start_pos,
             "end_pos": end_pos,
-            "modified_tag": modified_tag
         })
     return tag_data
@@ -508,8 +510,10 @@ def find_similar_artists(original_tags_string, top_n, similarity_weight, allow_n
         bad_tags_illustrated_string = {"text":new_tags_string, "entities":bad_entities}
         #bad_tags_illustrated_string = {"text":original_tags_string, "entities":bad_entities}
-        modified_tags = [tag_info['modified_tag'] for tag_info in tag_data]
-        X_new_image = vectorizer.transform([','.join(modified_tags + removed_tags)])
         similarities = cosine_similarity(X_new_image, X_artist)[0]
         top_artist_indices = np.argsort(similarities)[-(top_n + 1):][::-1]

 This tool serves as a linguistic bridge to the e621 image board tag lexicon, on which many popular models such as Fluffyrock, Fluffusion, and Pony Diffusion v6 were trained.
 When you enter a txt2img prompt and press the "submit" button, the Tagset Completer parses your prompt and checks that all your tags are valid e621 tags.
+If it finds any that are not, it recommends some valid e621 tags you can use to replace them in the "Unknown Tags" section.
 Additionally, in the "Top Artists" text box, it lists the artists who would most likely draw an image having the set of tags you provided.
 This is useful to align your prompt with the expected input to an e621-trained model.
 An example that illustrates acceptable parentheses and weight formatting is:
 ((sunset over the mountains)), (clear sky:1.5), ((eagle flying high:2.0)), river, (fish swimming in the river:1.2), (campfire, (marshmallows:2.1):1.3), stars in the sky, ((full moon:1.8)), (wolf howling:1.7)
+## Why are some valid tags marked as "unknown", and why don't some artists ever get returned?
 Some data is excluded from consideration if it did not occur frequently enough in the sample from which the application makes its calculations.
 If an artist or tag is too infrequent, we might not think we have enough data to make predictions about it.
     for tag_text, start_pos in new_image_tags_with_positions:
         # Modify the tag
         modified_tag = tag_text.replace('_', ' ').replace('\\(', '(').replace('\\)', ')').strip()
+        artist_matrix_tag = tag_text.replace('_', ' ').replace('\\(', '\(').replace('\\)', '\)').strip()
         # Calculate the end position based on the original tag length
         end_pos = start_pos + len(tag_text)
         # Append the structured data for each tag
             "original_tag": tag_text,
             "start_pos": start_pos,
             "end_pos": end_pos,
+            "modified_tag": modified_tag,
+            "artist_matrix_tag": artist_matrix_tag
         })
     return tag_data
         bad_tags_illustrated_string = {"text":new_tags_string, "entities":bad_entities}
         #bad_tags_illustrated_string = {"text":original_tags_string, "entities":bad_entities}
+        #modified_tags = [tag_info['modified_tag'] for tag_info in tag_data]
+        #X_new_image = vectorizer.transform([','.join(modified_tags + removed_tags)])
+        artist_matrix_tags = [tag_info['artist_matrix_tag'] for tag_info in tag_data]
+        X_new_image = vectorizer.transform([','.join(artist_matrix_tags + removed_tags)])
         similarities = cosine_similarity(X_new_image, X_artist)[0]
         top_artist_indices = np.argsort(similarities)[-(top_n + 1):][::-1]