- Implemented a new script `val_test.py` to analyze classification results from a JSONL file.
- Extracted true labels and predicted responses, handling invalid entries gracefully.
- Generated a classification report with accuracy metrics and detailed statistics for each category.
- Added functionality to export results to CSV and save analysis reports.
- Included visualization of confusion matrix and category accuracy distribution.
- Ensured dynamic handling of categories based on the input data.