Skip to content
View joaovitormsilva's full-sized avatar

Block or report joaovitormsilva

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
joaovitormsilva/README.md

👋 Hi there! Welcome to my GitHub profile

I'm João Vitor Martins da Silva

🎓 Bachelor's Student in Information Systems at University of São Paulo (USP)
💼 Data Engineer at F1rst Digital Services


💡 About Me

I'm a Data Engineer passionate about building scalable data pipelines and productionizing machine learning models. With a technical background in Electronics and hands-on experience in automated testing, I now focus on data engineering, cloud optimization, and AI/ML operations.

Key Achievements:

  • 🚀 Productionized 80+ machine learning models using Kedro and Azure Databricks
  • 💰 Reduced cloud infrastructure costs by 34% through systematic optimization
  • ✅ Maintained 96% regression test coverage for critical financial applications
  • 📊 Built real-time data pipelines processing data from multiple sources

🔧 Tech Stack

Languages & Frameworks:
Python SQL Spark

Cloud & Big Data:
Databricks Azure

Tools & DevOps:
Jenkins Git Docker Kafka

Expertise:

  • Data Engineering: ETL/ELT Pipelines, Data Modeling, Data Quality
  • ML Operations: Model Deployment, Kedro, Control-M Orchestration
  • Cloud Optimization: FinOps, Resource Management, Cost Reduction
  • Programming: Python, PySpark, SQL

📌 Featured Projects

An end-to-end Data Engineering & AI platform for automated bank statement processing, intelligent expense classification using LLMs, and investment portfolio analytics.

Tech Stack: Python, LLMs, Data Engineering, AI


Complete project for data ingestion, analysis, and quality testing of sales data using Spark and PySpark on Databricks platform.

Tech Stack: PySpark, Databricks, Data Quality Testing


Real-time data processing system for IoT sensors using Kafka for messaging, Spark Structured Streaming for processing, and PostgreSQL for storage.

Tech Stack: Kafka, Spark Streaming, PostgreSQL, Docker, Python


📚 Certifications & Training

  • Fundamentals of Data Engineering — Joe Reis
  • Practical Deep Learning for Coders — Jeremy Howard (Fast.ai)
  • Data Science, Spark & Data Visualization — Alura
  • Software Quality — Federado Foundation & Professional
  • DevOps & Git — F1rst Digital Services

🌐 Connect With Me

LinkedIn Email GitHub

📧 Email: joao.vitormsilva@usp.br
💼 LinkedIn: linkedin.com/in/vitorjoao
🌍 Location: São Paulo, Brazil


📊 GitHub Stats

GitHub Stats Top Languages


🎯 Current Focus

  • 🔭 Building scalable data pipelines with Azure Databricks
  • 🌱 Learning advanced MLOps and real-time streaming architectures
  • 👯 Looking to collaborate on open-source data engineering projects

⭐️ If you're working on innovative data engineering challenges or looking for a passionate data engineer with proven results, let's connect!

Pinned Loading

  1. FinData-Intelligence FinData-Intelligence Public

    An end-to-end Data Engineering & AI platform for automated bank statement processing, intelligent expense classification using LLMs, and investment portfolio analytics.

    Python

  2. Vendas-Livrarias Vendas-Livrarias Public

    Repositório do projeto de Análise de Vendas de uma livraria

    Jupyter Notebook

  3. Monitoramento-Sensor-Iot Monitoramento-Sensor-Iot Public

    Sistema de monitoramento de sensores IoT em tempo real utilizando arquitetura de streaming com Producer e Consumer

    Python