Skip to main content
Jianfei Xu 中文

Data Analysis / Operations

Applied statistics student focused on data workflows, AI annotation platforms, and research operations.

Strong foundation in data engineering and AI toolchain development, with experience in Python, SQL, data cleaning, structured processing, million-scale database construction, Label Studio private deployment, Chinese localization, multimodal annotation, and model output evaluation.

Data Analysis Operations

About

Data, platform, and research collaboration experience

B.S. Applied Statistics student at Anhui University, expected 2027.

Experience supporting research-oriented data workflows, including collection, cleaning, filtering, analysis, annotation, evaluation, and technical documentation.

Has participated in large-model-related research, private AI annotation platform deployment, and summer research at the National University of Singapore.

Education

Applied statistics and analysis foundation

Anhui University

B.S. in Applied Statistics

2023 - 2027

Relevant coursework

Time Series AnalysisApplied Stochastic ProcessesData StructuresStatistical Forecasting and Decision-MakingMultivariate Statistical AnalysisJava Programming
  • Achieved strong academic performance in Java Programming, Data Structures, and mathematics-related courses.
  • Completed interdisciplinary training in artificial intelligence, data science, machine learning, and deep learning applications.
  • Participated in a summer research program at the National University of Singapore.
  • Published one ICCGV (EI) conference paper as a co-first author.

Experience

Research and enterprise data internships

Work spanning large-model data, annotation platforms, quality assessment, and technical documentation.

2025.10 - 2026.04

Research Assistant

Institute of Information Engineering, Chinese Academy of Sciences

  • Supported end-to-end data workflows for large-model-related research tasks, including data collection, cleaning, filtering, processing, and analysis.
  • Participated in private deployment and Chinese localization of the Label Studio annotation platform for secure intranet research environments.
  • Contributed to multimodal data annotation and model output evaluation.
  • Prepared deployment guides, operation manuals, and other technical documentation.

2024.07 - 2024.08

Data Analysis Intern

iFlytek Co., Ltd.

  • Provided data analysis and quality assessment support for large-model-related tasks.
  • Analyzed, filtered, cleaned, and organized model-generated data for training and evaluation workflows.
  • Assisted with data quality checks and usability improvement; supported workflows with an accuracy level of approximately 80%.
  • Participated in procurement data information management and maintained related data records.

2023.12 - 2024.02

Data Processing Intern

iFlytek Co., Ltd.

  • Handled data sorting, cleaning, classification, and formatting for large-model-related tasks.
  • Performed data screening, deduplication, and structured processing to support model training and data analysis.
  • Assisted in preliminary checking of model outputs and documented data-related issues.
  • Supported internal data documentation and information management.

Projects

Data engineering, annotation platform, and model evaluation projects

Data Engineer

Million-Scale Medical Database End-to-End Construction

  • Contributed to a high-quality medical database for precision medicine research, covering data acquisition, cleaning, de-identification, structuring, version management, and quality control.
  • Integrated heterogeneous data from electronic medical records, laboratory reports, and other sources.
  • Established standardized field-mapping rules and quality control indicators for over one million sensitive medical records.
  • Introduced automated validation mechanisms and missing-value handling strategies, achieving a data completeness rate of over 96% and annotation consistency above 94%.

Platform Lead

Label Studio Private Deployment and Chinese Localization

  • Led private deployment of Label Studio for a secure intranet research environment using Docker.
  • Completed deep Chinese localization across interface text, interaction prompts, error messages, and backend management modules, with localization coverage above 98%.
  • Conducted full-function testing, cross-browser compatibility verification, and Chinese display optimization.
  • Prepared deployment and annotation operation documentation.
  • Built a collaborative workflow combining manual annotation with multi-model pre-annotation; the related methodology contributed to one SCI paper currently under review.

Researcher, National University of Singapore

Summer Research Program

  • Participated in research focused on local deployment of large models, visual question answering, and model output evaluation.
  • Deployed multiple large models, including Flamingo-3B, Flamingo-9B, Qwen-13B, and ChatGPT-MINI.
  • Used deployed models for VQA tasks and developed scripts for multi-dimensional scoring and analysis of generated outputs.
  • Collaborated with the research team to complete and publish a related paper.

Skills

Skill set for data analysis and operations roles

Programming & Data Processing

PythonSQLJava (basic)Data cleaningStructured data processingData quality validationMissing-value handlingField mapping

Database & Data Engineering

Million-scale data processingDatabase constructionData ingestionVersion managementQuality-control workflow design

AI & Machine Learning Tools

Label StudioDockerModel data processingMultimodal annotationVQAModel output evaluation

Statistics & Analysis

Time Series AnalysisApplied Stochastic ProcessesMultivariate Statistical AnalysisStatistical Forecasting and Decision-Making

Documentation & Collaboration

Technical documentationDeployment manualsAnnotation guidesResearch collaboration

Resume

Download the public resume

Downloads are generated from public-facing content and do not link to the original source PDFs.

Public resume

Includes education, experience, projects, skills, and public email; omits phone number, age, and gender.

Contact

Reach out through the public email

Only contact details suitable for public publishing are shown.

Preferred city

Hefei