Data Science
Hello there! My name is Lucas Welch, and I am a Canadian-American currently living in Irvine, California. I am a student at Claremont McKenna College pursuing a dual major in Data Science and Economics. I am also concurrently recieving a MS in Financial Engineering from Claremont Graduate University. My passion lies in creative problem solving, financial engineering, probability and statistics, software engineering, startup businesses, and machine learning. In my spare time, I delve into probability within games like chess, poker, and Monopoly; explore cybersecurity; play baseball; DJ and curate playlists; and design websites/apps. I also am a casual Lord of the Rings fan... I've only read the books and watched the movies a few dozen times! Overall, I enjoy leveraging data-driven insights and applying them to real-world scenarios, continually honing my skills as a data scientist.
Skills
Experience
Education
The best language models are trained on more than 1 trillion tokens of English language text. Most languages, however, do not have such large training datasets available. We investigate an extremely data-limited regime where only 80,000 tokens of text are available in the form of a high-quality Latin textbook. We also introduce a new dataset for evaluating Latin models that contains over 5,000 high-quality human annotated questions and answers that were originally designed to assess human learning. We find that the small, high-quality textbook data is sufficient to improve the performance of language models on this new dataset.
Learn MoreOil has been the world's primary energy source since the mid-19th century, supplying 33% of the world's energy and underpinning modern society. The oil industry significantly impacts the US economy, and the country's standing in the world is closely tied to the oil and gas trade, making a thriving oil industry critical to the country's global market position. Although renewable energy is an urgent goal, it is still a long way from meeting current demands, and the US must rely on its own energy sources to avoid dependence on foreign countries, which could control the US economy through the price of oil and gas exports.
Learn MoreThis paper introduces linear algebra as a fundamental subject that has numerous applications in various fields. It explores the concepts of vector spaces, linear transformations, and systems of linear equations, and their relevance to real-world problems. The paper also examines the properties of matrices, their inverses, determinants, and eigenvectors, and their critical role in solving linear algebra problems. The goal of the paper is to provide a comprehensive introduction to the fundamental principles of linear algebra and demonstrate its broad applications in various fields, including image processing, data analysis, and machine learning
Learn MoreThis Monopoly algorithm employs a comprehensive approach to calculating landing probabilities, taking into account various factors such as dice rolls, Community Chest and Chance card instructions, and the likelihood of going to jail. The simulation is run 500,000 times to ensure accurate data collection, utilizing an OOP Python script that integrates the game board, dice, and special rules to generate randomized games. By combining Python and R, this Monopoly algorithm provides a comprehensive approach to analyzing the game, taking into account the complexities of the game's rules and chance events.
As a Software Developer and Founder at Athena Network LLC, I played a crucial role in bringing this innovative mobile-based application to life. I was responsible for developing and implementing the software that powers the Athena Network platform, which aims to provide college students with comprehensive resources related to their courses. I was involved in all stages of the development process, from ideation and planning to coding, testing and deployment. Gained technical experience with React Native, JSX, JavaScript, and HTML/CSS.
The best language models are trained on more than 1 trillion tokens of English language text. Most languages, however, do not have such large training datasets available. We investigate an extremely data-limited regime where only 80,000 tokens of text are available in the form of a high-quality Latin textbook. We also introduce a new dataset for evaluating Latin models that contains over 5,000 high-quality human annotated questions and answers that were originally designed to assess human learning. We find that the small, high-quality textbook data is sufficient to improve the performance of language models on this new dataset.
As a League of Legends player, I find it fascinating to use R to predict the winning outcome of a game. By analyzing the correlation between various factors, such as Blue Total Gold and Winning Rate, I can better understand what leads to a higher winning rate for my team. I explore the positively right-skewed distribution of total gold above 18000 and negatively left-skewed distribution of total gold below 16000 to see how gold accumulation impacts the winning rate. I also consider the Blue Tower Objectives and the number of Heralds obtained to make more accurate predictions, which has helped me become a better player and strategist.
I love Sudoku puzzles, however, I find it frustrating when I get stuck, so you wanted to create a program to help me solve them. I wrote a Python program that solves a Sudoku puzzle using a backtracking algorithm. It creates a GUI using the tkinter module to display the Sudoku board, and allows the user to input a custom puzzle to solve. This program works for every Sudoku puzzle I've tested, prove me wrong...
This project aims to monitor the spread of the coronavirus on social media by scanning all geotagged tweets sent in 2020. To handle the large volume of data, the project employs the MapReduce divide-and-conquer paradigm to create parallel code. A map.py file was created to track the usage of specific coronavirus-related hashtags on Twitter at a language and country level, with a shell script used to handle large volumes of data and the data consolidated into two files for easier analysis and visualized using bar and line charts.
Copyright © 2023 Lucas Welch. All rights reserved.