Mutual Fund Data Provider
Unearthing matching mutual funds with different names and assigning a unique ID for them by Utilising maximum likelihood estimation, fund names were matched with a prior probability set for unmatched names and a threshold score was set to filter out ambiguous matches.
Executing a Comprehensive Data Warehouse
Set up a highly secure data warehouse equipped with a data ingest and reporting tool that granted full authentication and authorization.
Transformation Engine for Data Conversion
Created a transformation engine that works with command pattern and is customisable via JSON based configuration. A Spark based data ingest pipeline was tailored for the purpose. The solution was then carried through Azure Data Lake and Databricks.
Effective Data Storage & Repository for a major Job portal
Developed a fully functional Data Lake, spark based transformers to run text processing algorithms, data portioning and spark based data pipeline. NLP metadata augmentation is another major highlight of the project.