Best Practices

Best Practices to Uploading Files to AI Data Analyst Chat

For optimal experience, ensure your file is structured correctly. Learn more!

For optimal experience, ensure your file is structured correctly. Learn more!

David Bressler

Aug 1, 2024

Well-organized, clean and properly structured data allows Formula Bot to analyze your data with high confidence and accuracy.

We recommend following the following best practices:

▪️Headers

Always include a header row at the top of your dataset. This allows the Data Analyzer to understand what each column represents. The header must be in the first row and first column.

Ensure that your table only has one header row at the top. Multiple headers or sub-headers causes confusion during data interpretation. The header row should clearly define the content of each column without needing additional sub-headers. If your data is complex and seems to require multiple header levels, consider restructuring it for simplicity or splitting it into multiple tables.

▪️File Formats

Formula Bot accepts the following spreadsheet files: .xls, .xlsx, and .csv. When preparing the data, save your file in one of the recognized formats to ensure compatibility.

▪️Uniform Data Types

Each column in your dataset should ideally represent a single data type (e.g., all numbers, all dates, all strings). Mixing data types within a single column can result in errors or misinterpretations.

❌ Example of an improper data type. In this example, there are mixed data types within a single column (text and numbers)

Example of a proper data type. In this example, there is only one data types within a single column (numbers).

▪️Avoid Improper Data Formatting

Formula Bot might struggle with inconsistent spreadsheet structures or data that isn't formatted correctly. Here are some typical problems to steer clear of:

❌ Multiple sections. Ensure your spreadsheet doesn't have multiple sections. There must be only one table in the spreadsheet. In this example, there are three sections.

❌ Empty headers or columns not on top row.

❌ Avoid having empty rows or columns.

▪️Proper Data Formatting

Well-cleaned and structured data enables Formula Bot to perform accurate and error-free data analysis and visualization. Below are best practices to follow:

✅ Tabular format. Your data should be in the form of rows of records below the headers.

✅ Column headers. Descriptive column headers must be in the first row and no other row. All data must fall immediately under the header row.

▪️Consistent Formatting

Maintain a consistent format, especially for dates and numbers. Decide on a date format (e.g., DD/MM/YYYY, MM-DD-YYYY) and number format (e.g., 1,000, 1000) and stick to it throughout the dataset.

▪️No Merged Cells

Merged cells can confuse the AI. Always unmerge cells and ensure data is in its dedicated cell.

▪️Handling Missing Data

Decide on a consistent way to represent missing data. Common methods include leaving the cell blank or using a placeholder like "N/A".

▪️Encoding Error

There may be issues with the encoding of the file. Ensure the Excel file is saved as UTF-8.