Nicholai
AI & ML interests
Recent Activity
Organizations
nmitchko's activity
Here are the steps on how Logic-of-Thought (LoT) is implemented:
-- 1. Logic Extraction
1. Use Large Language Models (LLMs) to identify sentences containing conditional reasoning relationships from the input context.
2. Generate a collection of sentences with logical relationships.
3. Use LLMs to extract the set of propositional symbols and logical expressions from the collection.
4. Identify propositions with similar meanings and represent them using identical propositional symbols.
5. Analyze the logical relationships between propositions based on their natural language descriptions.
6. Add negation (¬) for propositions that express opposite meanings.
7. Use implication (→) to connect propositional symbols when a conditional relationship exists.
-- 2. Logic Extension
1. Apply logical reasoning laws to the collection of logical expressions from the Logic Extraction phase.
2. Use a Python program to implement logical deduction and expand the expressions.
3. Apply logical laws such as Double Negation, Contraposition, and Transitivity to derive new logical expressions.
-- 3. Logic Translation
1. Use LLMs to translate the newly generated logical expressions into natural language descriptions.
2. Combine the natural language descriptions of propositional symbols according to the extended logical expressions.
3. Incorporate the translated logical information as a new part of the original input prompt.
-- 4. Integration with Existing Prompting Methods
1. Combine the LoT-generated logical information with the original prompt.
2. Use this enhanced prompt with existing prompting methods like Chain-of-Thought (CoT), Self-Consistency (SC), or Tree-of-Thoughts (ToT).
3. Feed the augmented prompt to the LLM to generate the final answer.
What do you think about LoT?
How to fine tune?
Slow inferencing
finger pop in mouth, session drummer on acoustic set
@Kearm apologies, typing from my phone. I'll test later tonight.
zero_index = (unique_vals == 0).nonzero()
zero_count = temp_counts[zero_index]
@lmg-anon @Kearm , the code snippet should be fixed above. I was confusing what torch.unique returned.
The unique function will return the values and their counts. zero_index in the above snippets points to the counts tensor. In the counts tensor we can compare that against a previous iteration.
And thinking through this code, there's got to be a better way to check if two sets of activation tensors overlap, and pick the ones that overlap the least.
Thanks you!
I will try ASAP when I have the opportunity, very interesting
I just fixed an obvious bug in that snippet, so feel free to respond if something isn't working as expected :)
Perhaps you're not thinking of the layer selection correctly. Pick the layers that have a different cached PC1 for your positive and negative dataset as featured in the article.
You could just cycle through them when doing the ablation:
I didn't get a chance to test this code but it should work
pos = -1
start_layer = 14
end_layer=40
final_layer = start_layer
## First computation
harmful_mean_act = harmful_cache['resid_pre', start_layer][:, pos, :].mean(dim=0)
harmless_mean_act = harmless_cache['resid_pre', start_layer][:, pos, :].mean(dim=0)
activation_difference = harmful_mean_act - harmless_mean_act
refusal_dir = activation_difference / activation_difference.norm()
counts = 0
for layer in range(start_layer+1, end_layer):
temp_harmful_mean_act = harmful_cache['resid_pre', layer][:, pos, :].mean(dim=0)
temp_harmless_mean_act = harmless_cache['resid_pre', layer][:, pos, :].mean(dim=0)
temp_activation_difference = temp_harmful_mean_act - temp_harmless_mean_act
unique_vals, temp_counts = torch.unique(activation_difference - temp_activation_difference, return_counts=True, dim=0)
zero_index = (unique_vals == 0).nonzero()
zero_count = temp_counts[zero_index]
# if counts is None:
# refusal_dir = temp_activation_difference - temp_activation_difference.norm()
# activation_difference = temp_activation_difference
# counts = temp_counts
# final_layer = layer
if zero_count > counts:
refusal_dir = temp_activation_difference / temp_activation_difference.norm()
activation_difference = temp_activation_difference
counts = zero_counts
final_layer = layer
Un-censoring a model is the first pass at a broader goal of knowledge programming in these AI models; we can shortly provide simple inference level ways to model internal layers of a model into a "thought space" and be able to fine tune or adjust arbitrary information in and out. This has immense applications for industry with non-public data and limited training resources and this work is thoroughly saluted.