Best Practices

Best Practices to Uploading Files to AI Data Analyst Chat

For optimal experience, ensure your file is structured correctly. Learn more!

For optimal experience, ensure your file is structured correctly. Learn more!

David Bressler

Aug 1, 2024

In order to see the full power of Formula Bot's Data Analyzer, you'll need to ensure that your data is structured in a way to be analyzed. In this article, we'll outline best practices for preparing and uploading files and provide you with examples of both proper and improper data formatting.

Headers

Always include a header row at the top of your dataset. This allows the Data Analyzer to understand what each column represents. The header must be in the first row and first column.

Ensure that your table only has one header row at the top. Multiple headers or sub-headers causes confusion during data interpretation. The header row should clearly define the content of each column without needing additional sub-headers. If your data is complex and seems to require multiple header levels, consider restructuring it for simplicity or splitting it into multiple tables.

File Formats

Formula Bot accepts the following spreadsheet files: .xls, .xlsx, and .csv. When preparing the data, save your file in one of the recognized formats to ensure compatibility.

Uniform Data Types

Each column in your dataset should ideally represent a single data type (e.g., all numbers, all dates, all strings). Mixing data types within a single column can result in errors or misinterpretations.

Improper Formatting:

Column A:

  • 25

  • John

  • 07/12/2020

Proper Formatting:

  • Column A (Names): John, Alice, Bob...

  • Column B (Numbers): 25, 30, 45...

  • Column C (Dates): 07/12/2020, 08/12/2020...

Improper Data Formatting

Formula Bot might struggle with inconsistent spreadsheet structures or data that isn't formatted correctly. Here are some typical problems to steer clear of:

❌ Multiple sections. Ensure your spreadsheet doesn't have multiple sections. There must be only one table in the spreadsheet.

❌ Empty rows or columns. Make sure there are no empty rows or columns.

❌ Non-tabular data. Non-tabular data is data that doesn't fit neatly into rows and columns like you would find in tables, spreadsheets, or databases. Make sure your data is tabular.

Proper Data Formatting

Well-cleaned and structured data enables Formula Bot to perform accurate and error-free data analysis and visualization. Below are best practices to follow:

✅ Tabular format. Your data should be in the form of rows of records below the headers.

✅ Column headers. Descriptive column headers must be in the first row and no other row. All data must fall immediately under the header row.

Consistent Formatting

Maintain a consistent format, especially for dates and numbers. Decide on a date format (e.g., DD/MM/YYYY, MM-DD-YYYY) and number format (e.g., 1,000, 1000) and stick to it throughout the dataset.

No Merged Cells

Merged cells can confuse the AI. Always unmerge cells and ensure data is in its dedicated cell.

Avoid Special Characters

Special characters, particularly in column headers, can sometimes be misinterpreted by the AI. Stick to alphanumeric characters where possible.

Handling Missing Data

Decide on a consistent way to represent missing data. Common methods include leaving the cell blank or using a placeholder like "N/A".

Encoding Error

There may be issues with the encoding of the file. Ensure the Excel file is saved as UTF-8.