Spaces:
Running
Running
Update token chunk size input in app.py
Browse filesThe code has been updated to include detailed information about the chunk size parameter. The user can now choose a chunk size target and see an explanation about how the target functions. This provides them with necessary context and improves the user interface by enhancing understanding and enforcing breakpoints for more logical code.
app.py
CHANGED
@@ -55,7 +55,20 @@ def get_language_by_extension(file_extension):
|
|
55 |
|
56 |
language = get_language_by_extension(file_extension)
|
57 |
|
58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
original_col, chunked_col = st.columns(2)
|
61 |
|
|
|
55 |
|
56 |
language = get_language_by_extension(file_extension)
|
57 |
|
58 |
+
st.write("""
|
59 |
+
### Choose Chunk Size Target""")
|
60 |
+
token_chunk_size = st.number_input('Target Chunk Size Target', min_value=5, max_value=1000, value=25, help="The token limit guides the chunk size in tokens (tiktoken, gpt-4), aiming for readability without enforcing a strict upper limit.")
|
61 |
+
|
62 |
+
with st.expander("Learn more about the chunk size target"):
|
63 |
+
st.markdown("""
|
64 |
+
The `token_limit` parameter in the `chunk` function serves as a guideline to optimize the size of code chunks produced. It is not a hard limit but rather an ideal target, attempting to achieve a balance between chunk size and maintaining logical coherence within the code.
|
65 |
+
|
66 |
+
- **Adherence to Logical Breakpoints:** The chunking logic respects logical breakpoints in the code, ensuring that chunks are coherent and maintain readability.
|
67 |
+
- **Flexibility in Chunk Size:** Chunks might be slightly smaller or larger than the specified `token_limit` to avoid breaking the code in the middle of logical sections.
|
68 |
+
- **Handling Final Chunks:** The last chunk of code captures any remaining code, which may vary significantly in size depending on the remaining code's structure.
|
69 |
+
|
70 |
+
This approach allows for flexibility in how code is segmented into chunks, emphasizing the balance between readable, logical code segments and size constraints.
|
71 |
+
""")
|
72 |
|
73 |
original_col, chunked_col = st.columns(2)
|
74 |
|