mysql_lib api documentation

class mysql_lib.mysql_connection(global_config)[source]

Bases: object

connect_db()[source]

Connects to the database and creates a dictionary cursor

connect_db_ubf()[source]

Connects to the database and creates an unbuffered cursor

disconnect_db()[source]

Disconnects the dictionary connection

disconnect_db_ubf()[source]

Disconnects the unbuffered connection

class mysql_lib.mysql_engine(global_config, logger)[source]

Bases: object

check_mysql_config()[source]

The method check if the mysql configuration is compatible with the replica requirements. If all the configuration requirements are met then the return value is True. Otherwise is false. The parameters checked are log_bin - ON if the binary log is enabled binlog_format - must be ROW , otherwise the replica won’t get the data binlog_row_image - must be FULL, otherwise the row image will be incomplete

Returns:true if all the requirements are met, false if not
Return type:boolean
copy_table_data(pg_engine, copy_max_memory, lock_tables=True)[source]

copy the table data from mysql to postgres param pg_engine: The postgresql engine required to write into the postgres database. The process determines the estimated optimal slice size using copy_max_memory and avg_row_length from MySQL’s information_schema.TABLES. If the table contains no rows then the slice size is set to a reasonable high value (100,000) in order to get the table copied in one slice. The estimated numer of slices is determined using the slice size. Then generate_select is used to build the csv and insert columns for the table. An unbuffered cursor is used to pull the data from MySQL using the CSV format. The fetchmany with copy_limit (slice size) is called to pull out the rows into a file object. The copy_mode determines wheter to use a file (out_file) or an in memory file object (io.StringIO()). If there no more rows the loop exits, otherwise continue to the next slice. When the slice is saved the method pg_engine.copy_data is executed to load the data into the PostgreSQL table. If some error occurs the slice number is saved into the list slice_insert and after all the slices are copied the fallback procedure insert_table_data process the remaining slices using the inserts.

Parameters:
  • pg_engine – the postgresql engine
  • copy_max_memory – The estimated maximum amount of memory to use in a single slice copy
  • lock_tables – Specifies whether the tables should be locked before copying the data
generate_select(table_columns, mode='csv')[source]

The method builds the select list using the dictionary table_columns which is the columns key in the table’s metadata. The method can build a select list for a CSV output or an INSERT output. The default mode is csv.

Parameters:
  • table_columns – The table’s column dictionary with the column metadata
  • mode – the select mode, csv or insert
get_column_metadata(table)[source]

The method extracts the columns metadata for a specific table. The select builds also the field list formatted for the CSV copy or a single insert copy. The data types included in hexlify are hex encoded on the fly.

Parameters:table – The table name.

‘datetime’:’timestamp without time zone’, ‘date’:’date’, ‘timestamp’:’timestamp without time zone’,

get_fk_metadata()[source]

The method collects the foreign key metadata for the detach replica process. Currently doesn’t get the ON UPDATE/ON DELETE triggers

get_index_metadata(table)[source]

The method extracts the index metadata for a specific table. The select searches only for the BTREE indices using the information_schema.statistics table.

Parameters:table – The table name.
get_master_status()[source]

The method gets the master’s coordinates using the command SHOW MASTER STATUS. The dictionary cursor is stored in the class variable self.master_status

get_table_metadata()[source]

the metod collects the metadata for all the tables in the mysql schema specified in self.my_database using get_column_metadata and get_index_metadata. If there are tables in tables_limit the variable table_include is set in order to limit the results to the tables_limit only. The informations are stored in a dictionary. The key column stores the column metadata and indices stores the index metadata. The key name stores the table name. The dictionary is then saved in the class dictionary self.my_tables with the key set to table name.

get_table_type_map()[source]

The method builds a dictionary composed by the table name and the column/type mappings. The dictionary is used in read_replica to determine whether a field requires hex encoding.

insert_table_data(pg_engine, ins_arg)[source]

This method is a fallback procedure whether copy_table_data fails. The ins_args is a list with the informations required to run the select for building the insert statements and the slices’s start and stop. The process is performed in memory and can take a very long time to complete.

Parameters:
  • pg_engine – the postgresql engine
  • ins_arg – the list with the insert arguments (slice_insert, table_name, columns_insert, slice_size)
lock_tables()[source]

The method locks the tables using FLUSH TABLES WITH READ LOCK. The tables locked are limited to the tables found by get_table_metadata. After locking the tables the metod gets the master’s coordinates with get_master_status.

print_progress(iteration, total, table_name)[source]

Print the copy progress. As the variable total on innodb is estimated the percentage progress exceed the 100%. In order to reduce noise when the log level is info only the tables copied in multiple slices get the print progress.

Parameters:
  • iteration – The slice number currently processed
  • total – The estimated total slices
  • table_name – The table name
read_replica(batch_data, pg_engine)[source]

Stream the replica using the batch data. This method evaluates the different events streamed from MySQL and manages them accordingly. The BinLogStreamReader function is called with the only_event parameter which restricts the event type received by the streamer. The events managed are the following. RotateEvent which happens whether mysql restarts or the binary log file changes. QueryEvent which happens when a new row image comes in (BEGIN statement) or a DDL is executed. The BEGIN is always skipped. The DDL is parsed using the sql_token class. [Write,Update,Delete]RowEvents are the row images pulled from the mysql replica.

The RotateEvent and the QueryEvent cause the batch to be closed.

The for loop reads the row events, builds the dictionary carrying informations like the destination schema, the binlog coordinates and store them into the group_insert list. When the number of events exceeds the replica_batch_size the group_insert is written into PostgreSQL. The batch is not closed in that case and the method exits only if there are no more rows available in the stream. Therefore the replica_batch_size is just the maximum size of the single insert and the size of replayed batch on PostgreSQL. The binlog switch or a captured DDL determines whether a batch is closed and processed.

The update row event stores in a separate key event_update the row image before the update. This is required to allow updates where the primary key is updated as well.

Each row event is scanned for data types requiring conversion to hex string.

Parameters:
  • batch_data – The list with the master’s batch data.
  • pg_engine – The postgresql engine object required for writing the rows in the log tables
run_replica(pg_engine)[source]

Run a MySQL replica read attempt and stores the data in postgres if found. The method first checks if there is a reindex in progress using pg_engine.check_reindex. The gets the batch data from PostgreSQL. If the batch data is not empty the method read_replica is executed. When the method exits the replica_data list is decomposed in the master_data (log name, position and last event’s timestamp). If the flag close_batch is set the master status is saved in PostgreSQL, and the new batch id is saved in the mysql_engine class variable id_batch. This variable is used to determine whether the old batch should be closed or not. If the variable is not empty then the previous batch is closed updating the processed flag on the replica catalogue. Before the exit the method calls the pg_engine.process_batch to replay the changes in PostgreSQL

Parameters:pg_engine – The postgresql engine object required for storing the master coordinates and replaying the batches
unlock_tables()[source]

The method unlocks the tables previously locked by lock_tables