About Me

Hello there! My name is Lucas Welch, and I am a Canadian-American currently living in Irvine, California. I am a student at Claremont McKenna College pursuing a dual major in Data Science and Economics. I am also concurrently recieving a MS in Financial Engineering from Claremont Graduate University. My passion lies in creative problem solving, financial engineering, probability and statistics, software engineering, startup businesses, and machine learning. In my spare time, I delve into probability within games like chess, poker, and Monopoly; explore cybersecurity; play baseball; DJ and curate playlists; and design websites/apps. I also am a casual Lord of the Rings fan... I've only read the books and watched the movies a few dozen times! Overall, I enjoy leveraging data-driven insights and applying them to real-world scenarios, continually honing my skills as a data scientist.

  • Programming
    Python, R, SQL, Dart, HTML/CSS, Vim, JS, LaTeX, & Shell
  • Artificial Intelligence
    LLM training with small, highquality datasets (textbooks)
  • Quantitative trading
    Derivative pricing and risk management
  • App Developement
    Android/iOS App Developer for startup companies
  • Web Developement
    Web-dev freelancer
  • UI/UX
    Designing Web/App interfaces
  • May 2023 - Current
    AI Researcher at Claremont McKenna College
  • May 2023 - Current
    Software Engineer at CellectGen, Inc.
  • May 2023 - Current
    Data Analyst at CellectGen, Inc.
  • Aug 2022 - Jan 2023
    Data Analyst at Pence Wealth Management
  • May 2022 - Aug 2021
    Financial Analyst Intern at Pence Wealth Management
  • 2021 - 2022
    Founder Athena Network LLC.
  • 2023 - Current
    Claremont Graduate University,
    Master of Science, Financial Engineering
  • 2021 - Current
    Claremont McKenna College,
    Bachelor's degree, Data Science & Economics
  • 2019 - 2021
    Arnold O. Beckman High School in Irvine, California
    Summa cum laude
  • 2017 - 2019
    Orange Lutheran High School in Orange, California
  • Papers & Articles

    Coming soon...

    One Textbook Is All You Need (LLM Fine-tuning)

    The best language models are trained on more than 1 trillion tokens of English language text. Most languages, however, do not have such large training datasets available. We investigate an extremely data-limited regime where only 80,000 tokens of text are available in the form of a high-quality Latin textbook. We also introduce a new dataset for evaluating Latin models that contains over 5,000 high-quality human annotated questions and answers that were originally designed to assess human learning. We find that the small, high-quality textbook data is sufficient to improve the performance of language models on this new dataset.

    Learn More

    The Importance of Oil in The Global Economy

    Oil has been the world's primary energy source since the mid-19th century, supplying 33% of the world's energy and underpinning modern society. The oil industry significantly impacts the US economy, and the country's standing in the world is closely tied to the oil and gas trade, making a thriving oil industry critical to the country's global market position. Although renewable energy is an urgent goal, it is still a long way from meeting current demands, and the US must rely on its own energy sources to avoid dependence on foreign countries, which could control the US economy through the price of oil and gas exports.

    Learn More

    Linear Algebra Made Simple

    This paper introduces linear algebra as a fundamental subject that has numerous applications in various fields. It explores the concepts of vector spaces, linear transformations, and systems of linear equations, and their relevance to real-world problems. The paper also examines the properties of matrices, their inverses, determinants, and eigenvectors, and their critical role in solving linear algebra problems. The goal of the paper is to provide a comprehensive introduction to the fundamental principles of linear algebra and demonstrate its broad applications in various fields, including image processing, data analysis, and machine learning

    Learn More

    My Work

    Monopoly Algorithm

    This Monopoly algorithm employs a comprehensive approach to calculating landing probabilities, taking into account various factors such as dice rolls, Community Chest and Chance card instructions, and the likelihood of going to jail. The simulation is run 500,000 times to ensure accurate data collection, utilizing an OOP Python script that integrates the game board, dice, and special rules to generate randomized games. By combining Python and R, this Monopoly algorithm provides a comprehensive approach to analyzing the game, taking into account the complexities of the game's rules and chance events.

    Athena Network

    As a Software Developer and Founder at Athena Network LLC, I played a crucial role in bringing this innovative mobile-based application to life. I was responsible for developing and implementing the software that powers the Athena Network platform, which aims to provide college students with comprehensive resources related to their courses. I was involved in all stages of the development process, from ideation and planning to coding, testing and deployment. Gained technical experience with React Native, JSX, JavaScript, and HTML/CSS.

    Latin-Davinci & Latin-LLaMA

    The best language models are trained on more than 1 trillion tokens of English language text. Most languages, however, do not have such large training datasets available. We investigate an extremely data-limited regime where only 80,000 tokens of text are available in the form of a high-quality Latin textbook. We also introduce a new dataset for evaluating Latin models that contains over 5,000 high-quality human annotated questions and answers that were originally designed to assess human learning. We find that the small, high-quality textbook data is sufficient to improve the performance of language models on this new dataset.

    League of Legends Winning Prediction

    As a League of Legends player, I find it fascinating to use R to predict the winning outcome of a game. By analyzing the correlation between various factors, such as Blue Total Gold and Winning Rate, I can better understand what leads to a higher winning rate for my team. I explore the positively right-skewed distribution of total gold above 18000 and negatively left-skewed distribution of total gold below 16000 to see how gold accumulation impacts the winning rate. I also consider the Blue Tower Objectives and the number of Heralds obtained to make more accurate predictions, which has helped me become a better player and strategist.

    Sudoku Solver

    I love Sudoku puzzles, however, I find it frustrating when I get stuck, so you wanted to create a program to help me solve them. I wrote a Python program that solves a Sudoku puzzle using a backtracking algorithm. It creates a GUI using the tkinter module to display the Sudoku board, and allows the user to input a custom puzzle to solve. This program works for every Sudoku puzzle I've tested, prove me wrong...

    Covid-19 Twitter Analysis

    This project aims to monitor the spread of the coronavirus on social media by scanning all geotagged tweets sent in 2020. To handle the large volume of data, the project employs the MapReduce divide-and-conquer paradigm to create parallel code. A map.py file was created to track the usage of specific coronavirus-related hashtags on Twitter at a language and country level, with a shell script used to handle large volumes of data and the data consolidated into two files for easier analysis and visualized using bar and line charts.

    See More

    Contact Me

    [email protected]

    714-866-8283

    Download Resume