认知神经科学研究报告【20260097】
文章目录Detailed System Principle for Mathematical Reasoning Training1. Overall Goal and Design Philosophy2. Data Generation Module Core Principles2.1 Random Expression Generation (Exemplified by Derivatives)2.2 Step‑by‑Step Calculation Generation2.3 Code Construction Template3. Special Handling for Each Problem Type3.1 Arithmetic3.2 Integration3.3 Limits3.4 Linear Algebra3.5 Sequence Limits (Limsup/Liminf)4. Data Aggregation Script Principles4.1 Weighted Mixed Generation4.2 Resume from Checkpoint4.3 Timeout Control4.4 Progress and Error Handling5. LoRA Fine‑Tuning Principles5.1 LoRA Overview5.2 Adaptation to DeepSeek‑Coder5.3 4‑bit Quantisation5.4 Training Workflow5.5 Automatic Resume from Checkpoint6. Testing and Validation Principles6.1 Code Generation6.2 Dynamic Execution6.3 Evaluation Metrics6.4 Random Test Mode7. Key Technical Choices and Trade‑offs8. Performance and Reliability9. ConclusionDetailed System Principle for Mathematical Reasoning TrainingThis document provides an in‑depth explanation of the design principles, core algorithms, engineering decisions, and internal workings of the entire system. It is intended for developers who need to understand, maintain, or extend the framework.1. Overall Goal and Design PhilosophyGoal: To enable the DeepSeek‑Coder‑6.7B‑Instruct model to generate verifiable, step‑by‑step Python code for mathematical problems. By executing the generated code, one obtains exact numeric/symbolic results and a transparent chain of intermediate computations, all automatically checkable.Design Philosophy:Verifiability first: All outputs are executable code; execution results can be compared with ground truth, eliminating reliance on the model’s arithmetic capabilities.Explicit reasoning chain: The step‑by‑step process is expressed via code variables andprintstatements, rather than ambiguous natural language.Symbolic computation offloaded: Heavy symbolic work (differentiation, integration, limits, etc.) is delegated to SymPy. The model only learns to write the right code, not to perform the calculations itself – reducing its burden and improving accuracy.Modularity and extensibility: Six independent generators share a common interface, making it easy to add new problem types.2. Data Generation Module Core Principles2.1 Random Expression Generation (Exemplified by Derivatives)Principle: Recursively build a random mathematical expression tree, controlling depth and complexity.Atoms: constants, variablex, powers, elementary functions (exp, log, sin, cos, etc.).Combinators: arithmetic operations (, −, ×, ÷) and function composition (sin, exp, log, etc.).Depth control:max_depthlimits recursion levels to prevent overly deep expressions.Fixed random seed: ensures reproducibility.The generator returns a SymPy expression object, which is then used for symbolic computations.2.2 Step‑by‑Step Calculation GenerationKey Algorithm: Recursively traverse the expression tree, applying the corresponding differentiation rule for each node type, and record intermediate results.Power rule:d/dx x^n n*x^(n-1)Chain rule:d/dx f(g(x)) f(g(x)) * g(x)Product rule:d/dx (u*v) uv uvQuotient rule:d/dx (u/v) (uv - uv)/v^2For each sub‑expression, the function calls itself to obtain the derivative of the sub‑expression, then combines them. It records the rule name, the intermediate expression, and the final simplified result.Advantage: Derivatives are computed by SymPy, but we “snapshot” every intermediate value during generation, and hard‑code these values into the generated code. This avoids redundant computation at runtime and keeps execution fast.2.3 Code Construction TemplateEvery generated Python script follows a fixed skeleton:Import SymPy functions.Define symbolic variables.Print the problem statement.Step‑by‑step assignments (e.g.,u 2*x,du 2) andprintintermediate results.Compute and print the final answer (for verification).This structure cleanly separateslogicfrompresentation, making the scripts easy to execute and validate.3. Special Handling for Each Problem Type3.1 ArithmeticUses Lark grammar to parse the expression tree.Step‑by‑step evaluation: each operator node records the values of its children and the result of the current operation.Generates a sequence of variables (step_1,step_2, …) and prints each intermediate value.Final result computed by Python float arithmetic.3.2 IntegrationUses SymPy’smanualintegrate.integral_stepsto obtain a rule tree for integration.Recursively traverses the rule tree, identifying rule types (substitution, integration by parts, power rule, etc.).Generates corresponding code snippets (e.g., substitution:u ...,du diff(u,x); parts:u ...,dv ...).Careful variable naming and back‑substitution ensure correctness.3.3 LimitsPre‑defines common limit types (factorisation, important limits, L’Hôpital’s rule, etc.).Each type corresponds to a fixed code template with standardised step descriptions, and a finallimitcall.For diversity, parameters (e.g., point of approach) are randomly varied.3.4 Linear AlgebraRandomly generates vector sets and, with a certain probability, constructs linearly dependent ones (via linear combination).The code uses SymPy’sMatrix.rank()to compute the rank and compares it with the number of vectors.Steps: display the matrix, compute rank, print the conclusion.3.5 Sequence Limits (Limsup/Liminf)Supports piecewise sequences with modulusm; the user provides one expression per residue class.Computes the limit set of each subsequence (considering left/right endpoints and approach direction).Final intersection (liminf) and union (limsup) are calculated.The code includes built‑inlimit_seqand interval handling functions.4. Data Aggregation Script Principles4.1 Weighted Mixed GenerationUser supplies weights (e.g.,arithmetic:0.5,derivative:1).Weights are normalised to build a list of task types (e.g.,[derivative, derivative, integral]).Each iteration randomly picks one type from this list and generates one sample.4.2 Resume from CheckpointAfter each successful generation, the sample is immediately appended to the JSONL file with aflush()to ensure persistence.At startup, the script counts the existing lines; if already ≥ target, it exits; otherwise it continues from where it left off.The progress bar starts from the existing count.4.3 Timeout ControlUsessignal.alarmto set a time limit per sample.If timeout occurs, aTimeoutErroris raised, caught by the outer loop, counted as failure, and skipped – preventing indefinite hanging due to complex SymPy simplifications.4.4 Progress and Error HandlingUsestqdmfor real‑time progress, rate, and failure count.Failed samples (timeout, exceptions, or generator returningNone) do not count as successes, but consume attempts (up to 10× the target number).5. LoRA Fine‑Tuning Principles5.1 LoRA OverviewLoRA (Low‑Rank Adaptation) is a parameter‑efficient fine‑tuning method.It freezes the pretrained model and inserts trainable low‑rank decomposition matrices (AandB) alongside the Q/K/V/O projection matrices in Transformer layers.Only these low‑rank matrices are trained – far fewer parameters ( 1%) – dramatically reducing memory and training time.5.2 Adaptation to DeepSeek‑CoderTarget modules:q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj– covering all attention heads and feed‑forward layers.LoRA rankr8, scaling factoralpha16, dropout0.05.5.3 4‑bit QuantisationUsesBitsAndBytesConfigto load the base model in 4‑bit (NF4 type) with double quantisation (double_quantTrue) to further reduce memory.prepare_model_for_kbit_trainingadds necessary adapters (e.g., gradient checkpointing) for training the quantised model.5.4 Training WorkflowDataset: JSONL withinstruction(problem) andoutput(code).Input format:### Problem:\n{instruction}\n### Solution:\n{output}.Tokenization: padded witheos_token, truncated tomax_seq_length1024.Optimiser: AdamW, learning rate2e-4, warmup 100 steps.Evaluation: on validation split ateval_steps, keep the best checkpoint.5.5 Automatic Resume from CheckpointScans the output directory forcheckpoint-*folders and selects the latest by modification time.Passes the path totrainer.train(resume_from_checkpointpath)to restore full training state (optimiser, scheduler, etc.).6. Testing and Validation Principles6.1 Code GenerationUses the same prompt template (### Problem:\n...\n### Solution:\n) to query the model.Extracts the code block after### Solution:(stripping markdown code fences).6.2 Dynamic ExecutionWrites the extracted code to a temporary.pyfile and spawns a subprocess to run it.Captures stdout and stderr with a timeout (default 10 seconds).If return code is 0, the code is considered executable; otherwise, errors are recorded.6.3 Evaluation MetricsPrimary metric: execution success rate (percentage of generated scripts that run without errors).Can be extended to compare the printed “answer” with ground truth, but the current version only verifies successful execution.6.4 Random Test ModeIf no test file is provided, the script automatically calls the generators to create random problems, testing the model’s generalisation ability.7. Key Technical Choices and Trade‑offsChoiceRationaleAlternativesOutput Python code instead of natural languageExecutable, verifiable correctnessNatural language CoT (not verifiable)Use SymPy for intermediate valuesAccurate symbolic computation, avoids arithmetic errorsModel‑directed computation (unreliable)Hard‑code intermediate results in codeSimplifies runtime computation, speeds executionDynamic derivation (still possible, but heavier)LoRA fine‑tuning instead of full fine‑tuningMemory‑friendly, fast, supports incremental updatesFull fine‑tuning (more resource‑intensive, risk of catastrophic forgetting)4‑bit quantisationEnables running 6.7B models on V100 16G8‑bit or no quantisation (VRAM insufficient)Timeout mechanismPrevents rare complex expressions from hanging the processNo timeout (risky)8. Performance and ReliabilityGeneration speed: ~2–5 samples/sec (depends on complexity and hardware), average per sample 0.2–0.5 sec.Failure rate: Typically 5%, mainly from limits/integrals with pathological expressions; timeouts mitigate them.Training resources: With 4‑bit, batch size 4, gradient accumulation 4, memory usage ~14GB.Extensibility: Adding a new problem type only requires implementing a generator and registering it; the aggregation, training, and testing scripts remain unchanged.9. ConclusionThis system builds a complete closed‑loop pipeline through carefully designed data generation, structured code outputs, parameter‑efficient fine‑tuning, and automated validation. Its core strengths areverifiable correctness, transparent reasoning, and resource‑friendly deployment. It provides a solid technical foundation for applications such as mathematics education and automatic problem solving.