Usage
The complete Cleora interface is just a few simple options, which usually run well on default settings.
File Source
The method of uploading the input file. Currently Cleora supports the following upload methods:
- Local Disk - upload a file from your local filesystem
- Google Drive - upload from Google Drive
- Previous Runs - if you want to reuse a previously uploaded file - for example, for experiments with a different dimensionality or max number of iterations
Embedding initialization
Do you have your own initial embeddings? - the default answer is NO, which means that Cleora will initialize the embeddings. However, if you have meaningful entity embeddings you might want to try them out as Cleora initial embeddings. These own embeddings can be for example text embeddings computed from item descriptions, or image embeddings computed from image embeddings. For more information, see Your Own Initial Embeddings.
If you have own initial embeddings and select YES, there are three options for upload: from local disk, from Google Drive, and from previously loaded files.
Dimensions
The dimensionality of the embeddings you will receive. Usually, a number such as 512 or 1024 is a good idea. Larger embeddings can store more information, but they often need more training data to be relevant.
Note: This option disappears if own initial embeddings are loaded. In such a case, embedding dimensionality will be the same as the uploaded initial embeddings.
Max Number of Iterations
The number of iterations of the Cleora algorithm. For the usual sizes of enterprise relational data (10,000 - 10,000,000 users) the best value is usually between 2 and 4. However, we also encountered situations where the best results were obtained with 1 or 15 iterations. The intuition is that too small number can result in uninformative embeddings, however too many iterations can result in overfitting.
We recommend starting with smaller values, as overfitting will result in meaningless embeddings.