AboutExperienceProjectBlog

Chengshuo Dai

chengshuo.dai23@gmail.com

Walking through the mud, Gazing at the stars.

"脚踏泥泞,仰望星空。"

"Who doesn't yearn to ride the crest of the coming tide?"

About Me
Icon 1Icon 2Icon 3

I’m a self-taught AI/LLM engineer in progress, currently transitioning into the field by studying GenAI and building real-world LLM apps.

My academic background spans Information Mgt, Finance, and Biostat, but growing fascination with LLM led me to pivot toward AI engineering. I began this journey in 2025, and since then I’ve been focused on studying how modern large language model systems work.

Through self-study and hands-on projects, I’ve been exploring the end-to-end LLM stack (including the engineer intuition of math/stat behind) — from model fundamentals (architectures, pretraining, fine-tuning, RLHF) to inference optimization and downstream applications such as RAG pipelines, agentic workflows, and AI-powered search systems.

I believe in the AI era, the most valuable asset is not a single background, but the curiosity, discipline, and ability to fast-learn and adapt. As a self-driven builder, I’m deeply curious about how new AI technologies work, enjoy exploring them from first principles and turn them into working systems and practical products.

Here’s a quote that deeply inspires me:
Background defines the past, but problems define the future.

Chengshuo Dai

"

In the age of software3.0, there are no fixed background — only problem waiting to be solved.

"

Education

Yale University

2025.08 - 2027.05

Biostatistics Data Science

Advanced coursework in statistical methods, machine learning, data analysis, and computational biology. Research focus on NLP and knowledge graphs in genomics.

Gerstein Lab ResearchNLP & Genomics

Capital University of Economics and Business

2021.09 - 2025.06

Information Management and Information Systems (Finance.)

Comprehensive coursework in Java, Python, Database Systems, Web Design, and System Design. Strong foundation in information systems, data management, and software development.

Academic Excellence Scholarship (2022-2023, 2023-2024)Merit Student (2022-2023, 2023-2024)

Experience

ETH | Machine Learning Engineer Intern (Nework, CA)

2025.09 - 2025.12

Agentic Multimodal Search System

Built an Agentic Multimodal Search System POC to improve search coverage and relevance, supporting hybrid retrieval and temporal queries. Developed a multimedia indexing pipeline and an LLM query understanding module for NL-to-DSL parsing, fine-tuning domain-specific LLMs to optimize retrieval and server-side performance.

Yale Gerstein Lab | Research Assistant

2025.09 - 2025.11

BioGraphRAG Biomedical Retrieval System

Built a GraphRAG-style biomedical mechanism retrieval system with DAG reasoning constraints to reduce semantic drift. Designed a comprehensive retrieval pipeline integrating Neo4j multi-hop search, semantic index construction, and FAISS reranking to generate traceable mechanism explanations.

Skills

Languages

Python, R, PyTorch, TypeScript, SQL, Java

Frameworks/Tools

LangChain, LLaMA Factory, Elasticsearch, Docker, FastAPI, Linux, Git, AWS

Projects

A context-aware QA system with conversation memory management built on LangChain and DeepSeek. Features a scalable microservices architecture with FastAPI, SQLAlchemy ORM, and Docker containerization, supporting streaming responses and multi-user sessions.

LangChainDeepSeekFastAPITypeScriptSQLAlchemyDocker

An end-to-end RAG-based document QA system enabling natural language interaction with unstructured PDFs. Implements a complete LLM pipeline including document parsing, embedding generation, and FAISS-based semantic retrieval using Python, OpenAI API, and Streamlit.

PythonLangChainOpenAI APIStreamlitFAISSRAG
DCS Blog

My Blog

Thoughts, learnings, and reflections on software engineering, AI, and life.

Send Me a Message

Motto

Always do the meaningful things.