Documentation#
The Sqlconnector class in this module is designed to aid database interactions using SQLAlchemy. It includes methods for establishing database connections, executing SQL queries and commands, and retrieving query results as pandas DataFrames. The class can be configured using either a configuration file (sqlconnect.yaml) or a dictionary.
- Classes:
Sqlconnector: A class to handle SQL database connections and operations.
- Dependencies:
pandas: Used for handling query results as DataFrames.
sqlalchemy: Required for database connection and query execution.
pathlib: Utilised for handling file paths.
sqlconnect.config: A custom module for handling configuration details.
Example
>>> import sqlconnect as sc
>>>
>>> connection = sc.Sqlconnector("My_Database")
>>>
>>> df = connection.sql_to_df("path/to/sql_query.sql") # Assign results of a query to a DataFrame
>>>
>>> print(df.describe()) # Explore the dataframe with Pandas
- class sqlconnect.connector.Sqlconnector(connection_name: str, config_path: str = None, config_dict: dict = None)[source]#
A class to handle SQL database connections and operations.
This class provides methods to connect to a SQL database using SQLAlchemy, execute SQL queries, and perform database operations.
- Parameters:
connection_name (str) – The name of the connection to be used. This name should correspond to an entry in sqlconnect.yaml or dictionary.
config_path (str, optional) – The file path of sqlconnect.yaml. If not provided, the current directory or home directory is used.
config_dict (dict, optional) – A dictionary containing database connection configurations. If provided, it overrides the configurations from the file specified in config_path.
- connection_name#
The name of the connection.
- Type:
str
- engine#
The SQLAlchemy engine object used for database connections.
- Type:
sqlalchemy.engine.Engine
- df_to_sql(df: DataFrame, name: str, schema: str = None, if_exists: str = 'fail', index: bool = True, index_label=None, chunksize: int = None, dtype=None, method=None) int | None[source]#
Write a pandas DataFrame to a SQL database table.
This method allows for explicit control over how the DataFrame is written to the SQL table. The parameters correspond to those in pandas.DataFrame.to_sql, allowing for a detailed specification of the SQL writing process. See the pandas documentation for more details: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html
- Parameters:
df (pandas.DataFrame) – The DataFrame to be written to the SQL table.
name (str) – Name of the SQL table to which the DataFrame should be written.
schema (str, optional) – Specify the schema (if database flavor supports this). If None, use default schema.
if_exists (str, default 'fail') – How to behave if the table already exists. Values include ‘fail’, ‘replace’, ‘append’.
index (bool, default True) – Write DataFrame index as a column. Uses index_label as the column name in the table.
index_label (str or sequence, optional) – Column label for index columns(s). If None is given and index is True, then the index names are used.
chunksize (int, optional) – Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.
dtype (dict or scalar, optional) – Specifying the datatype for columns. If a dictionary is used, the keys should be column names and the values should be SQLAlchemy types.
method ({None, 'multi', callable}, optional) – Controls the SQL insertion clause used.
- Returns:
The number of rows inserted, if known, otherwise None.
- Return type:
Union[int, None]
- Raises:
RuntimeError – If there is an error in writing to the SQL table.
TypeError – If the provided DataFrame or table name is not of the correct type.
Examples
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) >>> connection.df_to_sql(df, 'table_name', if_exists='append', index=False) This will write the DataFrame 'df' to the 'table_name' table in the connected SQL database, appending the data without including the index.
- execute_sql(sql_path: str) None[source]#
Execute a SQL command from a file.
- Parameters:
sql_path (str) – The file path of the SQL command to be executed.
- Raises:
RuntimeError – If there is an error in reading the file or executing the SQL command. This includes file not found errors and other general exceptions.
- execute_sql_str(command: str) None[source]#
Execute a SQL command from a string.
- Parameters:
command (str) – The SQL command to be executed.
- Raises:
Exception – If there is an error in executing the command.
- sql_to_df(query_path: str, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None) DataFrame | Generator[DataFrame, None, None][source]#
Execute a SQL query from a file and return the results in a pandas DataFrame. This method allows additional keyword arguments that are passed directly to pandas.read_sql_query from the pandas library https://pandas.pydata.org/docs/reference/api/pandas.read_sql_query.html
- Parameters:
query_path (str) – The file path of the SQL query to be executed.
index_col (str or list of str, optional, default: None) – Column(s) to set as index(MultiIndex).
coerce_float (bool, default True) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point.
params (list, tuple or dict, optional, default: None) – List of parameters to pass to execute method.
parse_dates (list or dict, default: None) –
List of column names to parse as dates.
Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
chunksize (int, optional) – Return Pandas DataFrames as a generator.
dtype (Type name or dict of column -> type, optional) – Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}.
- Returns:
A DataFrame containing the results of the SQL query, or a generator yielding DataFrames if ‘chunksize’ is specified.
- Return type:
Union[pandas.DataFrame, Generator[pandas.DataFrame, None, None]]
- Raises:
RuntimeError – If the file cannot be found or if there is an error in executing the query.
TypeError – If the provided query_path is not a string
Examples
>>> # This will execute the SQL query and return a DataFrame, fetching 1000 rows at a time. >>> df = connection.sql_to_df("path/to/sql_query.sql", chunksize=1000)
- sql_to_df_str(query: str, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None) DataFrame | Generator[DataFrame, None, None][source]#
Execute a SQL query from a string and return the results in a pandas DataFrame. This method allows additional keyword arguments that are passed directly to pandas.read_sql_query from the pandas library https://pandas.pydata.org/docs/reference/api/pandas.read_sql_query.html
- Parameters:
query (str) – The SQL query to be executed.
index_col (str or list of str, optional, default: None) – Column(s) to set as index(MultiIndex).
coerce_float (bool, default True) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point.
params (list, tuple or dict, optional, default: None) – List of parameters to pass to execute method.
parse_dates (list or dict, default: None) –
List of column names to parse as dates.
Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
chunksize (int, optional) – Return Pandas DataFrames as a generator.
dtype (Type name or dict of column -> type, optional) – Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}.
- Returns:
A DataFrame containing the results of the SQL query, or a generator yielding DataFrames if ‘chunksize’ is specified.
- Return type:
Union[pandas.DataFrame, Generator[pandas.DataFrame, None, None]]
- Raises:
RuntimeError – If there is an error in executing the query.
TypeError – If the provided query is not a string
Examples
>>> # This will execute the SQL query and return a DataFrame, fetching 1000 rows at a time. >>> df = connection.sql_to_df_str("SELECT * FROM company.employees", chunksize=1000)