...
Following points summarize how I plan to approach the project:
- Given a dataset, calculate the field metrics (Primary mentor for this project, Jeremy, told me that the algorithms for these already exists, we would just need to implement those).
Some field metrics depend on the size of the dataset, for example Hmax, UqVal etc. Instead of considering their values, we would consider their percentage.
As I discussed with Jeremy, we have only one training dataset, instead of building the decision trees from the same dataset again and again, it would be better if we would just store the agreeable set of decision trees.
Jeremy has written a python code which builds the decision trees based on the training dataset. I would run that code and get the agreeable set of decision trees. After that I would encode the trees in a format we find best (probably xml). These decision trees would be resource to our system.
Having done that, I would write a code which would read the stored decision trees, take the field metrics (calculated from the dataset) as input and using the decision trees provide us the fields to use for Patient Matching.
- Then I will build a UI for this system and I will integrate the entire system with the Patient Matching Module.
...
Proposed Timeline
A rough project timeline is as follows:
...