Abstract
Software code understanding is strongly dependent on the identifier names; therefore, software developers spend a lot of time specifying appropriate names for variables, functions, classes, and files. Manually suggesting a useful name is a time consuming and difficult problem for developers. For automatic identifiers name recommendation, various techniques have been proposed. Most of the work has been done for method and class name prediction. Module names play an important role when reusing software libraries to develop new source code. A good module name communicates purpose, while an inappropriate name creates ambiguity and frustration in the developer’s mind. To the best of our knowledge, we did not find any work on module name suggestions or analysis of module names. In this paper, we emphasize the module name and propose the module name prediction approach. First, we extract module files from the online python projects to create a corpus. Next, we apply preprocessing steps to prepare the data for prediction models. We construct four similarity based models and three sequence generation models. The sequence generation models can predict the module name tokens in a sequence, while similarity based models only suggest pre-stored module names. Experimental results indicate that the TF-IDF model performed best among all the models, followed by the three sequence generation models.