Mutual Fund Data Provider

Unearthing matching mutual funds with different names and assigning a unique ID for them by Utilising maximum likelihood estimation, fund names were matched with a prior probability set for unmatched names and a threshold score was set to filter out ambiguous matches.

Analysing data

Executing a Comprehensive Data Warehouse

Set up a highly secure data warehouse equipped with a data ingest and reporting tool that granted full authentication and authorization.

Transformation Engine for Data Conversion

Created a transformation engine that works with command pattern and is customisable via JSON based configuration. A Spark based data ingest pipeline was tailored for the purpose. The solution was then carried through Azure Data Lake and Databricks.

Effective Data Storage & Repository  for a major Job portal

Developed a fully functional Data Lake, spark based transformers to run text processing algorithms, data portioning and spark based data pipeline. NLP metadata augmentation is another major highlight of the project. 

